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Preface to the Third Edition 


The field of matrix computations continues to grow and mature. In 
the Third Edition we have added over 300 new references and 100 new 
problems. The LINPACK and EISPACK citations have been replaced with 
appropriate pointers to LAPACK with key codes tabulated at the beginning 
of appropriate chapters. 

In the First Edition and Second Edition we identified a small number 
of global references: Wilkinson (1965), Forsythe and Moler (1967), Stewart 
(1973), Hanson and Lawson (1974) and Parlett (1980). These volumes are 
as important as ever to the research landscape, but there are some mag- 
nificent new textbooks and monographs on the scene. See The Literature 
section that follows. 

We continue as before with the practice of giving references at the end 
of each section and a master bibliography at the end of the book. 

The earlier editions suffered from a large number of typographical errors 
and we are obliged to the dozens of readers who have brought these to our 
attention. Many corrections and clarifications have been made. 

Here are some specific highlights of the new edition. Chapter 1 (Matriz 
Multiplication Problems) and Chapter 6 (Parallel Matrix Computations) 
have been completely rewritten with less formality. We think that this 
facilitates the building of intuition for high performance computing and 
draws a better line between algorithm and implementation on the printed 
page. 

In Chapter 2 (Matriz Analysis) we expanded the treatment of CS de- 
composition and included a proof. The overview of floating point arithmetic 
has been brought up to date. In Chapter 4 (Special Linear Systems) we 
embellished the Toeplitz section with connections to circulant matrices and 
the fast Fourier transform. A subsection on equilibrium systems has been 
included in our treatment of indefinite systems. 

A more accurate rendition of the modified Gram-Schmidt process is 
offered in Chapter 5 (Orthogonalization and Least Squares). Chapter 8 
(The Symmetric Eigenproblem) has been extensively rewritten and rear- 
ranged so as to minimize its dependence upon Chapter 7 (The Unsymmet- 
ric Eigenproblem). Indeed, the coupling between these two chapters is now 
so minimal that it is possible to read either one first. 

In Chapter 9 (Lanczos Methods) we have expanded the discussion of 
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the unsymmetric Lanczos process and the Arnoldi iteration. The “unsym- 
metric component" of Chapter 10 (Iterative Methods for Linear Systems) 
has likewise been broadened with a whole new section devoted to various 
Krylov space methods designed to handle the sparse unsymmetric linear 
system problem. 

In $12.5 (Updating Orthogonal Decompositions) we included a new sub- 
section on ULV updating. Toeplitz matrix eigenproblems and orthogonal 
matrix eigenproblems are discussed in $12.6. 

Both of us look forward to continuing the dialog with our readers. As 
we said in the Preface to the Second Edition, "It has been a pleasure to 
deal with such an interested and friendly readership." 

Many individuals made valuable Third Edition suggestions, but Greg 
Ammar, Mike Heath, Nick Trefethen, and Steve Vavasis deserve special 
thanks. 

Finally, we would like to acknowledge the support of Cindy Robinson 
at Cornell. A dedicated assistant makes a big difference. 


Software 


LAPACK 


Many of the algorithms in this book are implemented in the software pack- 
age LAPACK: 


E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. DuCroz, 
A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. 
Sorensen (1995). LAPACK Users’ Guide, Release 2.0, 2nd ed., SIAM 
Publications, Philadelphia. 


Pointers to some of the more important routines in this package are given 
at the beginning of selected chapters: 


Chapter 1. Level-1, Level-2, Level-3 BLAS 

Chapter 3. General Linear Systems 

Chapter 4. Positive Definite and Band Systems 

Chapter 5. Orthogonalization and Least Squares Problems 
Chapter 7. The Unsymmetric Eigenvalue Problem 
Chapter 8. The Symmetric Eigenvalue Problem 


Our LAPACK references are spare in detail but rich enough to "get you 
Started." Thus, when we say that _TRSV can be used to solve a triangular 
system Az = b, we leave it to you to discover through the LAPACK manual 
that A can be either upper or lower triangular and that the transposed 
system ATz = b can be handled as well. Moreover, the underscore is a 
Placeholder whose mission is to designate type (single, double, complex, 
etc). 

LAPACK stands on the shoulders of two other packages that are mile- 
stones in the history of software development. EISPACK was developed in 
the early 1970s and is dedicated to solving symmetric, unsymmetric, and 
generalized eigenproblems: 


B.T. Smith, J.M. Boyle, Y. Ikebe, V.C. Klema, and C.B. Moler (1970). 
Matriz Eigensystem Routines: EISPACK Guide, 2nd ed., Lecture Notes 
in Computer Science, Volume 6, Springer-Verlag, New York. 
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B.S. Garbow, J.M. Boyle, J.J. Dongarra, and C.B. Moler (1972). Matriz 
Ejigensystem Routines: EISPACK Guide Extension, Lecture Notes in 
Computer Science, Volume 51, Springer-Verlag, New York. 


LINPACK was developed in the late 1970s for linear equations and least 
squares problems: 


EISPACK and LINPACK have their roots in sequence of papers that feature 
Algol implementations of some of the key matrix factorizations. These 
papers are collected in 


J.H. Wilkinson and C. Reinsch, eds. (1971). Handbook for Automatic 
Computation, Vol. 2, Linear Algebra, Springer-Verlag, New York. 


NETLIB 


A wide range of software including LAPACK, EISPACK, and LINPACK is 
available electronically via Netlib: 


World Wide Web: attp://www.netlib. org/index. html 
Anonymous ftp: ftp://ftp.netlib.org 


Via email, send a one-line message: 


mail netlibfornl.gov 
send index 


to get started. 
MATLABÉ 


Complementing LAPACK and dsfining a very popular matrix computation 
enviroument is MATLAB: 


MATLAB User's Guide, The MathWorks Inc., Natick, Massachusetts. 
M. Marcus (1993). Matrices and MATLAB: A Tutorial, Prentice Hall, Up- 
per Saddle River, NJ. 


R- Pratap (1995). Getting Started with MATLAB, Saunders College Pub- 
lishing, Fort Worth, TX. 


Many of the problems in Matriz Computations are best posed to students 
as MATLAB problems. We make extensive use of MATLAB notation in the 
Presentation of algorithms. 


Selected References 


Each section in the book concludes with an annotated list of references. 
A master bibliography is given at the end of the text. 

Useful books that coliectively cover the field, are cited below. Chapter 
titles are included if appropriate but do not infer too much from the level 
of detail because one author's chapter may be another's subsectlon. The 
citations are classified as foliows: 


Pre-1970 Classics. Early volumes that set the stage. 
Introductory (General). Suitable for the undergraduate classroom. 
Advanced (General). Best for practitioners and graduate students. 
Analytical. For the supporting mathematics. 

Linear Equation Problems. Ar = b. 

Linear Fitting Problems. Ar = b, 

Eigenvalue Problems. Ar = Az. 

High Performance. Parallel/vector issues. 

Edited Volumes. Useful, thematic coliections. 


Within each group the entries are specified in chronological order. 
Pre-1970 Classics 


V.N. Faddeeva (1959). Computational Methods of Linear Algebra, Dover, 
New York. 


Basic Material from Linear Algebra Systeme of Linear Equations. The Proper 
Numbers and Proper Vectors of a Matrix. 


E. Bodewig (1959). Matriz Calculus, North Holland, Amsterdam. 


Matrix Calculus. Direct Methods for Linear Equations. Indirect Methods for Linear 
Equatious. Inversion of Matrices. Geodetic Matrices. Eigenproblems. 


R-S. Varga (1962). Matriz Iterative Analysis, Prentice-Hall, Englewood 
] 
Matrix Properties and Concepts. Nonnegative Matrices. Basic Iterative Methods 
and Comparison Theorems. Successive Overrelaxation Iterative Methods. Semi- 
Iterative Methods. Derivation and Solution of Elliptic Difference Equations. Alter- 
nating Direction Implicit Iterative Methods. Matrix Methods for Parabolic Partial 
Differential Equations, Estimation of Acceleration Parameters. 
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J.H. Wilkinson (1963). Rounding Errors in Algebraic Processes, Prentice- 
Hall, Englewood Cliffs, NJ. 


The Fundamental Arithmetic Operatious. Computations Involving Polynomials. 
Matrix Computations. 


A.S. Householder (1964). Theory of Matrices in Numerical Analysis, Blais- 
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Normalization and Reduction of the Matrix, Proper Values and Vectors: Successive 
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L. Fox (1964). An Introduction to Numerical Linear Algebra, Oxford Uni- 
versity Press, Oxford, England. 


Introduction, Matrix Algebra. Elimination Methods of Gauss, Jordan, and Aitken. 
Compact Elimination Methods of Doolittle, Crout, Banachiewicz, and Cholesky. 
Orthogonalization Methods. Condition, Accuracy, and Precision, Comparison of 
Methods, Measure of Work, Iterative and Gradient Methods, Iterative methods for 
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Notes on Error Analysis for Latent Roots and Vectors. 


J.H, Wilkinson (1965). The Algebraic Eigenvalue Problem, Clarendon Press, 
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Theoretical Background. Perturbation Theory. Error Analysis. Solution of Lin- 
ear Algebraic Equations, Hermitian Matrices, Reduction of a General Matrix to 
Condensed Form. Eigenvalues of Matrices of Condensed Forms. The LR and QR 
Algorithms, Iterative Methods. 


G.E. Forsythe and C. Moler (1967). Computer Solution of Linear Algebraic 
Systems, Prentice-Hall, Englewood Cliffs, NJ. 


Reader’s Background and Purpose of Book. Vector and Matrix Norms. Diagonal 
Form of a Matrix Under Orthogonal Equivalence. Proof of Diagonal Form Theorem. 
Types of Computational Problems in Linear Algebra. Types of Matrices encoun- 
tered in Practical Problems. Sources of Computational Problems of Linear Algebra. 
Condition of a Linear System. Gaussian Elimination and LU Decomposition. Need 
for lnterchanging Rows. Scaling Equations and Unknowns. The Crout and Dooiit- 
tle Variants. Iterative Improvement. Computing the Determinant. Nearly Singuiar 
Matrices. Algol 60 Program. Fortran, Extended Algol, and PL/I Programs. Ma- 
trix Inversion. An Example: Hilbert Matrices. Floating Point Round-Off Analysis. 
Rounding Error in Gaumian Elimination. Convergence of Iterative Improvement. 
Positive Definite Matrices; Band Matrices. Iterative Methods for Solving Linear 
Systema. Nonlinear Systems of Equations. 
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Matrix Computations 


Chapter 1 


Matrix Multiplication 
Problems 


§1.1 Basic Algorithms and Notation 

§1.2 Exploiting Structure 

§1.3 Block Matrices and Algorithms 

$1.4 Vectorization and Re-Use Issues 


The proper study of matrix computations begins with the study of the 
matrix-matrix multiplication problem. Although this problem is simple 
mathematically it is very rich from the computational point of view. We 
begin in §1.1 by looking at the several ways that the matrix multiplica- 
tion problem can be organized. The “language” of partitioned matrices 
is established and used to characterize several linear algebraic “levels” of 
computation. 

If a matrix has structure, then it is usually possible to exploit it. For 
example, a symmetric matrix can be stored in half the space as a general 
matrix. A matrix-vector product that involves a matrix with many zero 
entries may require much less time to execute than a full matrix times a 
vector. These matters are discussed in §1.2. 

In 81.3 block matrix notation is established. A block matrix is a matrix 
with matrix entries. This concept is very important from the standpoint of 
both theory and practice, On the theoretical side, block matrix notation 
allows us to prove important matrix factorizations very succinctly. These 
factorizations are the cornerstone of numerical linear algebra. From the 
computational point of view, block algorithms are important because they 


1 


2 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS 


are rich in matrix multiplication, the operation of choice for many new high 
performance computer architectures, 

These new architectures require the algorithm desiguer to pay as much 
attention to memory traffic as to the actual amount of arithmetic. This 
aspect of scientific computation is illustrated in $1.4 where the critical is- 
sues of vector pipeline computing are discussed: stride, vector length, the 
number of vector loads and stores, and the level of vector re-use. 


Before You Begin 


It is important to be familiar with the MATLAB language. See the 
texts by Pratap(1995) and Van Loan (1996). A richer introduction to high 
performance matrix computations is given in Dongarra, Duff, Sorensen, and 
Duff (1991). This chapter's LAPACK connections include 


LAPACK: Some General Operations 
z—ur 

ac zy 
ycorty 

y — aAz + By 
A — A c azyT 
C — aAB + BC 


Matrix-vector multiplication 
Rank-1 update 
Matrix multiplication 


Matrix-vector multiplication 
Matrix-vector multiplication (Packed) 
Rank-1 update 

Rank-2 update 

Rank-k update 

Rank-2k update 

Symmetric/General Product 


1.1 Basic Algorithms and Notation 


Matrix computations are built upon a hierarchy of linear algebraic opera- 
tions. Dot products involve the scalar operations of addition and multipli- 
cation. Matrix-vector multiplication is made up of dot products. Matrix- 
matrix multiplication amounts to a collection of matrix-vector products. 
All of these operations can be described in algorithmic form or in the lan- 
guage of linear algebra. Our primary objective in this section is to show 
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how these two styles of expression complement each another. Along the way 
we pick up notation and acquaint the reader with the kind of thinking that 
underpins the matrix computation area. The discussion revolves around 
the matrix multiplication problem, a computation that can be organized in 
several ways. 


1.1.4 Matrix Notation 


Let K denote the set of real numbers. We denote the vector space of all 
m-by-n real matrices by R™”": 


G11 >e Gin 
AE€R™" <> A= (aj) = : : a4 €R. 
mi °? Gmn 
If a capital letter is used to denote a matrix (e.g. A, B, A), then the 
corresponding lower case letter with subscript ij refers to the (i, j) entry 


(e.g., ais , bij 6:3). As appropriate, we also use the notation ( 4];; and 
A(i, j) to desiguate the matrix elements. 


1.1.2 Matrix Operations 
Basic matrix operations include transposition (R™*" — R"™™), 


C=AT = Cig = 053, 
addition (R™" x R™" _, gm*". 
C=A+B = Cig = Gij + bij, 


scalar-matriz multiplication, (R x R™™" — R™*"), 
C=aA = Cij = aas, 
and matriz-matriz multiplication (R™*? x RP*" — R™"), 


r 


C = AB = Gj = D aunbss. 
k=l 


These are the building blocks of matrix computations. 
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1.1.3 Vector Notation 
Let IR" denote the vector space of real n-vectors: 
zi 
zéR" = z=]: z;€R. 
Tn 
We refer to z; as the ith component of z. Depending upon context, the 
alternative notations [z|, and z(i) are sometimes used. 
Notice that we are identifying R” with IR"! and so the members of 
R” are column vectors, On the other hand, the elements of IRI*" are row 


vectors: 
ze R'"" c zrc-(n,..m). 


If x is a column vector, then y = zT is a row vector. 


1.1.4 "Vector Operations 


Assumeg € R, z € IR", and y € R”, Basic vector operations include scalar- 
vector multiplication, 


z=ar => 4 = ani, 


vector addition, 
zezty => o uc Ht, 


the dot product (or inner product), 


n 
cmaTy — c= Sy tiya 
ied 


and vector multiply (or the Hadamard product) 
Z-EX*y = R= iyi. 


Another very important operation which we write in “update form” is the 
sazpy: 
y —arty = ywranty 

Here, the symbol “=” is being used to denote assignment, not mathematical 
equality, The vector y is being updated. The name "saxpy" is used in 
LAPACK, a software package that implements many of the algorithms in 
this book. One can think of “saxpy” as a mnemonic for "scalar a x plus 
y^ 


1.1. Basic ALGORITHMS AND NOTATION 5 


1.1.5 The Computation of Dot Products and Saxpys 


We have chosen to express algorithms in a stylized version of the MATLAB 
language. MATLAB is a powerful interactive system that is ideal for matrix 
computation work. We gradually introduce our stylized MATLAB notation 
in this chapter beginning with an algorithm for computing dot products, 


Algorithm 1.1.1 (Dot Product) Ifz,y € IR", then this algorithm com- 
putes their dot product c — zT y. 

c=0 

for i= im 

c= c z(i)y(i) 

end 
The dot product of two n-vectors involves n multiplications and n additions, 
It is an “O(n)” operation, meaning that the amount of work is linear in 
the dimension. The saxpy computation is also an O(n) operation, but it 
returns a vector instead of a scalar. 


Algorithm 1.1.2 (Saxpy) If x,y €R” and a € R, then this algorithm 
overwrites y with az + y. 


for i = i:n 
y(i) = az(i) + y(i) 
end 


It must be stressed that the algorithms in this book are encapsulations of 
critical computational ideas.amd not “production codes.” 
1.1.6 Matrix-Vector Multiplication and the Gaxpy 
Suppose A € R”*” and that we wish to compute the update 

y = Atty 


where z € IR" and y € R” are given. This generalized saxpy operation is 
referred to as a gazpy. A standard way that this computation proceeds is 
to update the components one at a time: 


n 
Yi = Saga; TW t= lim. 
j=l 


This gives the following algorithm. 


Algorithm 1.1.3 (Gaxpy: Row Version) If A c R™", z c RR", and 
y € R”, then this algorithm overwrites y with AT + y. 
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for i= im 
for j= i:n 
y(i)  A(453)2) + y(i) 
end 
end 


An alternative algorithm results if we regard Ar as a linear combination of 
A's columns, e.g., 


12 1 1-7+2-8 1 2 23 
3 4 [s]- 3-74+4-8 | =7/31/4+8|/4)]=/ 53]. 
5 6 5-7+6-8 5 6 83 


Algorithm 1.1.4 (Gaxpy: Column Version) If A € R""", ze R^, 
and y € R”, then this algorithm overwrites y with AT + y. 


for j =1:n 
for i = lim 
y(i) = A(i,3)zG) + y(i) 
end 
end 


Note that the inner loop in either gaxpy algorithm carries out a saxpy 
operation, The column version was derived by rethinking what matrix- 
vector multiplication “means” at the vector level, but it could also have 
been obtained simply by interchanging the order of the loops in the row 
version, In matrix computations, it is important to relate loop interchanges 
to the underlying linear algebra. 


11.7 Partitioning a Matrix into Rows and Columns 
Algorithms 1.1.3 and 1.1.4 access the data in A by row and by column 
respectively, To highlight these orientations more clearly we introduce the 


language of partitioned matrices, 
From the row point of view, a matrix is a stack of row vectors: 
T 
AeR"" «<> A=]: re R^. (1.1.1) 
T 


m 


This is called a row partition of A. Thus, if we row partition 


12 
3 4|, 
5 6 
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then we are choosing to think of A as a collection of rows with 

raf. 2], rf 2[3 4] and rT =[5 6]. 
With the row partitioning (1.1.1) Algorithm 1.1.3 can be expressed as fol- 
lows: 


for i = im 
yr zy) 
end 


Alternatively, a matrix is a collection of column vectors: 
AER™" <> Azla,..,ce], «cR. (1.1.2) 


We refer to this as a column partition of A. In the 3-by-2 example above, we 
thus would set c, and c; to be the first and second columns of A respectively: 


With (1.1.2) we see that Algorithm 1.1.4 is a saxpy procedure that accesses 
A by columns: 


for j = i:n 
yszyeyty 
end 


In this context appreciate y as a running vector sum that undergoes re- 
peated saxpy updates. 
1.1.8 The Colon Notation 


A handy way to specify a column or row of a matrix is with the "colon" 
notation, If A € R™*", then A(k, :) designates the kth row, i.e, 


Alk, :) = fant. ++, kn] - 
The kth column is specified by 
Gik 
A(;, k) = 
amk 
With these conventions we can rewrite Algorithms 1.1.3 and 1.1.4 as 


for i = im 
y(i) = AG, :)z + y(i) 


end 
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and 


for j =1im 
y7z6)A(.3) +y 
end 
respectively. With the colon notation we are able to suppress iteration 
details. This frees us to think at the vector level and focus on larger com- 
putational issues. 


1.1.9 The Outer Product Update 


As a preliminary application of the colon notation, we use it to understand 
the outer product update 


A-4AÀ4-z',  AeR""zcR",ycR^. 


The outer product operation zy" “looks funny" but is perfectly legal, e.g., 


This is because zyT is the product of two “skinny” matrices and the number 
of columns in the left matrix z equals the number of rows in the right matrix 
yT. The entries in the outer product update are prescribed by 


for i= ium 
for j= ln 
aj = lij TS 
end 
end 


The mission of the j loop is to add a multiple of y7 to the i-th row of A, 
ie, 


for i = lim 
A(5:) = AG,:) + z()yT 
end 


On the other hand, if we make the i-loop the inner loop, then its task is to 
add a multiple of x to the jth column of A: 


for j=1im 
A(53) = AG) + vC) 
en 


Note that both outer product algorithms amount to a set of saxpy updates, 
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1.1.10 Matrix-Matrix Multiplication 


Consider the 2-by-2 matrix-matrix multiplication AB. In the dot product 
formulation each entry is computed as a dot product: 


1 2][5 6]_[1-5+2-7 1-6+2-8 
3 4|[7 8]7|3-544-7 3-644-8 | ° 


In the saxpy version each column in the product is regarded as a linear 
combination of columns of A: 


sdb sliil ofa] [1] 


Finally, in the outer product version, the result is regarded as the sum of 
outer products: 


papage ers] n 


Although equivalent mathematically, it turns out that these versions of 
matrix multiplication can have very different levels of performance because 
oftheir memory traffic properties. This matter is pursued in $1.4. For now, 
it is worth detailing the above three approaches to matrix multiplication 
because it gives us a chance to review notation and to practice thinking at 
different linear algebraic levels. 


1.1.11 Scalar-Level Specifications 

To fix the discussion we focus on the following matrix multiplication update: 
C=AB+C AeR™?, Bem" cem, 

The starting point is the familiar triply-nested loop algorithm: 

Algorithm 1.1.5 (Matrix Multiplication: ijk Variant) If A c R™*?, 


B € RP*" and C € IR"*" are given, then this algorithm overwrites C with 
AB +C. 


for i-1:m 
for j = lin 
for k = lip 
C(i,j) = A(t, k) B(k, 3) + C(4, j) 
en 
end 


end 
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This is the “ijk variant” because we identify the rows of C (and A) with i, 
the columns of C (and B) with j, and the summation index with k. 

We consider the update C = AB + C instead of just C = AB for two 
reasons. We do not have to bother with C = 0 initializations and updates 
of the form C = AB + C arise more frequently in practice. 

The three loops in the matrix multiplication update can be arbitrarily 
ordered giving 3! — 6 variations. Thus, 


for j = l:n 
for k = l:p 
for i= lim 
C(i, j) = AG, k)B(R, 7) + CG, j) 
end 
end 
end 


is the jki variant. Each of the six possibilities (ijk, jik, ikj, jki, kij, 
kji) features an inner loop operation (dot product or saxpy) and has its 
own pattern of data flow. For example, in the ijk variant, the inner loop 
oversees a dot product that requires access to a row of A and a column of 
B. The jki variant involves a saxpy that requires access to a column of C 
and a column of A. These attributes are summarized in Table 1.1.1 along 
with an interpretation of what is going on when the middle and inner loop 
are considered together. Each variant involves the same amount of floating 


Loop Inner Middle Inner Loop 

Order Loop Loop Data Access 
vector x matrix A by row, B by column 
matrix x vector A by row, B by column 

TOW gaxpy B by row, C by row 
column gaxpy A by column, C by column 
row Outer product B by row, C by row 

column outer product | A by column, C by column 


TABLE 1.1.1. Matriz Multiplication: Loop Orderings and Properties 


point arithmetic, but accesses the A, B, and C data differently. 


1.1.12 A Dot Product Formulation 


The usual matrix multiplication procedure regards AB as an array of dot 
products to be computed one at a time in left-to-right, top-to-bottom order. 
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This is the idea behind Algorithm 1.1.5. Using the colon notation we can 
highlight this dot-product formulation: 


Algorithm 1.1.6 (Matrix Multiplication: Dot Product Version) 
If Ae R™?, B € IRP*", and C € R™*" are given, then this algorithm 
overwrites C with AB +C. 
for i= lim 
for j = l:n 
C(i j) = AG) BG 9) + C(i. 7) 
end 
end 
In the language of partitioned matrices, if 
af 
A= : ay € IR? 
an, 
and 
B = [b1,... ba] by, € IR? 
then Algorithm 1.1.6 has this interpretation: 
for i = l:m 
for j = Ln 
Gij = af by + cij 
end 
end 
Note that the “mission” of the j-loop is to compute the ith row of the 
update. To emphasize this we could write 
for i = lim. 
q -aBeq 
end. 
where 


is a row partitioning of C. To say the same thing with the colon notation 
we write 
for i= lim 
C(i,:) = AG, )B + C(i:) 
end 
Either way we see that the inner two loops of the ijk variant define a 
row-oriented gaxpy operation. 
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1.1.13 A Saxpy Formulation 
Suppose A and C are column-partitioned as follows 

A = fai,...,ap] a; € IR^ 

C 


au 


[e 608] cj € RT". 


By comparing jth columns in C = AB + C we see that 


P 
c= D bayer TC juin 
kel 


These vector sums can be put together with a sequence of saxpy updates. 


Algorithm 1.1.7 (Matrix Multiplication: Saxpy Version) Ifthe ma- 
trices A € IR'™*?, B € IRP*", and C € IR"*" are given, then this algorithm 
overwrites C with AB +C. 


for j = lin 
for k = Lp 
C(43) = AG, K) B(k,7) + C(.3) 
en 
end 


Note that the k-loop oversees a gaxpy operation: 
for j = lin 
C(:, j) = ABG,3) + C(,3) 
en 
1.1.14 An Outer Product Formulation 
Consider the kij variant of Algorithm 1.1.5: 


for k = Lp 
for j = lin 
for i = l:m 
C(i, j) = A(i, K) B(k j) + C(3, 9) 
en 
end 
end 


The inner two loops oversee the outer product update 


C=a,6f +C 
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where 
af 
A=(ai,...,ap] and B=} : (1.1.3) 
ar 
p 
with a, € IR^ and bẹ € IR". We therefore obtain 


Algorithm 1.1.8 (Matrix Multiplication: Outer Product Version) 
If Ac R™*?, B € IRP*", and C € R™*" are given, then this algorithm 
overwrites C with AB + C. 


for k = l:p 
C = AQ k)B(k,:) +C 


end 


This implementation revolves around the fact that AB is the sum of p outer 
products. 


1.1.15 The Notion of “Level” 


The dot product and saxpy operations are examples of “level-1” operations. 
Level-1 operations involve an amount of data and an amount of arithmetic 
that is linear in the dimension of the operation. An m-by-n outer product 
update or gaxpy operation involves a quadratic amount of data (O(mn)) 
and a quadratic amount of work (O(mn)). They are examples of “level-2” 
operations. 

The matrix update C = AB +C is a "level-3" operation. Level-3 
operations involve a quadratic amount of data and a cubic amount of work. 
If A, B, and C are n-by-n matrices, then C = AB + C involves O(n?) 
matrix entries and O(n?) arithmetic operations. 

The design of matrix algorithms that are rich in high-level linear al. 
gebra operations is a recurring theme in the book. For example, a high 
performance linear equation solver may require a level-3 organization of 
Gaussian elimination. This requires some algorithmic rethinking because 
that method is usually specified in level-1 terms, e.g., “multiply row 1 by a 
scalar and add the result to row 2." 


1.1.16 A Note on Matrix Equations 


In striving to understand matrix multiplication via outer products, we es- 
sentially established the matrix equation 


P 
AB = V abt 
k=l 


where the a, and 5; are defined by the partitionings in (1.1.3). 
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Numerous matrix equations are developed in subsequent chapters. Some- 
times they are established algorithmically like the above outer product ex- 
pansion and other times they are proved at the ij-component level. As 
an example of the latter, we prove an important result that characterizes 
transposes of products. 


Theorem 1.1.1 Jf A € I'^*? and B € IEP*^, then (AB)T = BT AT. 
Proof. If C = (AB), then 


cij = (AB)T]a = [AB]; = Yabu. 


kal 


On the other hand, if D = BT AT, then 


di = [BT AT] = SBT atA" = Yina 
k=l k=l 


Since cj; = di; for all i and j, it follows that C = D. D 


Scalar-level proofs such as this one are usually not very insightful. However, 
they are sometimes the only way to proceed. 


1.1.17 Complex Matrices 


From time to time computations that involve complex matrices are dis- 
cussed. The vector space of m-by-n complex matrices is designated by 
C*^, The scaling, addition, and multiplication of complex matrices corre- 
sponds exactly to the real case. However, transposition becomes conjugate 
transposttion: 
C= AB => a= aH . 

The vector space of complex n-vectors is designated by C*. The dot product 
of complex n-vectors x and y is prescribed by 


n 
s=zfy= 3x. 
i=l 
Finally, if A= B +iC € C"*^, then we designate the real and imaginary 
parts of A by Re(A) = B and Im(A) = C respectively. 
Problems 


P1.1.1 Suppose A € ^X^ and z € R” are given. Give a saxpy algorithm for computing 
the first column of M = (A — ziJ)- --(A— zri). 
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P1,1.2 In the conventional 2-by-2 matrix multiplication C = AB, there are eight 
multiplications: 011611, 911612, 821012, 021612, 412621, @12b72, 824031 and a22b72. Make 
a table tbat indicates the order tbat these multiplicatinns are performed for the ijk, jik, 
kij, thy, jki, and kji matrix multiply algorithms. 

P1.1.3 Give an algorithm for computing C = (zy7)* where z and y are n-vectors. 
P1.1.4 Specify an algorithm for computing (XY 7)* where X, Y € E^*?, 


P1.1.5 Formulate an outer product algorithm for the update C = ABT + C where 
AE RTX", B c R^**, and C € RT", 


P1.1.8 Suppose we have real n-hy-n matrices C, D, E, and F. Show how to coinpute 
real n-by-n matrices A and B with just three real n-by-n matrix muitlplications eo that 
(A+4B) = (C+iD)(E+iF). Hint: Compute W = (C+ D(E — F). 


Notes and References for Sec. 1.1 


Tt must be stressed that the development of quality eoftware from any of our “semi- 
formal” algorithmic presentations is a long and arduous task. Even the implementation 
of the level-I,2, and 3 BLAS require care: 


C.L. Lawson, R.J. Hanson, , D.R. Kincaid, and F.T. Krogh (1979). “Basic Linear 
Algebra Subprograms for FORTRAN Usage,” ACM Trane. Math. Soft. 5, 308—323. 

C.L. Lawson, R.J Henson, D.R. Kincaid, and F.T- Krogh (1979). "Algorithm 539, 
Basic Linear Algebra Subprograms for FORTRAN Usage," ACM Trans. Math. Soft. 
5, 324-325. 

J.J. Dongarra, J, Du Cros, S. Hammarling, and R.J. Hanson (1988). “An Extended Set 
of Fortran Basic Linear Algebra Subprograma,” ACM Trans. Math. Soft. 14, 1-17. 

J.J. Dongarra, J. Du Cros, 3. Hammarling, and R.J. Hanson (1988). “Algorithm 656 An 
Extended Set of Fortran Basic Linear Algebra Subprograms: Model Implementation 
and Test Programs,” ACM Trans. Math. Soft. 14, 18-32. 

J.J. Dongarra, J. Du Croz, LS. Duff, and S.J. Hammarling (1990). “A Set of Level 3 
Basic Linear Algebra Subprograms,” ACM Trans. Math. Soft. 16, 1-I7. 

J.J. Dongarra, J. Du Cros, LS. Duff, and S.J. Hammarling (1990). “Algorithm 679. A 
Set of Level 3 Basic Linear Algebra Subprograms: Model Implementation and Test 
Programs," ACM Trans. Math. Soft. 16, 18-28. 


Other BLAS references include 


B. Kágstrom, P. Ling, and C. Van Loan (1991). “High-Performance Level-3 BLAS: 
Sample Routines for Double Precision Real Data,” in High Performance Computing 
II, M. Durand and F. Ei Dabaghi (eds), North-Holland, 269-281. 

B. Kégstrém, P. Ling, and C. Van Loan (1995). “GEMM-Based Levei-3 BLAS: High- 
Performance Model Implementations and Performance Evaluation Benchmerk," in 
Parallel Programming and Applications, P. Fritzon and L. Finmo (eds), ISO Press, 
184-188, 


For an appreciation of the subtleties associated with software development we recommend 


J.R. Rice (1981). Matriz Computations and Mathematical Software, Academic Press, 
New York. 


and a browse through the LAPACK manual. 
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1.2 Exploiting Structure 


The efficiency of a given matrix algorithm depends on many things. Most 
obvious and what we treat in this section is the amount of required arith- 
metic and storage. We continue to use matrix-vector and matrix-matrix 
multiplication as a vehicle for introducing the key ideas. As examples of 
exploitable structure we have chosen the properties of bandedness and sym- 
metry. Band matrices have many zero entries and so it is no surprise that 
band matrix manipulation allows for many arithmetic and storage short- 
cuts. Arithmetic complexity and data structures are discussed in this con- 
text. 

Symmetric matrices provide another set of examples that can be used to 
illustrate structure exploitation. Symmetric linear systems and eigenvalue 
problems have a very prominent role to play in matrix computations and 
so it is important to be familiar with their manipulation. 


1.2.31 Band Matrices and the x-0 Notation 


We say that A € IR'**" has lower bandwidth p if aj; = 0 whenever i > j +p 
and upper bandwidth q if j > i+ q implies a;; = 0. Here is an example of 
an 8-by-5 matrix that has lower bandwidth 1 and upper bandwidth 2: 


cOoocoooxx 
oOococoOoxxx 
OOOOXXXX 
oocoxxxxo 
coxxxxoo 


The x's designates arbitrary nonzero entries. This notation is handy to 
indicate the zero-nonzero structure of a matrix and we use it extensively. 
Band structures that occur frequently are tabulated in Table 1.2.1. 


1.2.3 Diagonal Matrix Manipulation 


Matrices with upper and lower bandwidth zero are diagonal. If D € IR"*" 
is diagonal, then 


D = diag(di,...,d4), gamin{mn} — di= dä 


If D is diagonal and A is a matrix, then DA is a row scaling of A and AD 
is a column scaling of A. 
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Type Lower Upper 
of Matrix Bandwidth | Bandwidth 


upper bidiagonal 
lower bidiagonal 

upper Hessenberg 
lower Hessenberg 


TABLE 1.2.1. Band Terminology for m-by-n Matrices 


1.2.3 Triangular Matrix Multiplication 


To introduce band matrix “thinking” we look at the matrix multiplication 
problem C = AB when A and B are both n-by-n and upper triangular. 
The 3-by-3 case is illuminating: 


G5 anbi + @izbe2 auba + arb + arbss 
C = 0 022522 623523 + 023533 


0 0 033533 


It suggests that the product is upper triangular and that its upper trian- 
gular entries are the result of abbreviated inner products. Indeed, since 
ü.kbg, = 0 whenever k < i or j < k we see that 


j 
oy = Y sab 
kei 


and so we obtain: 


Algorithm 1.2.1 (Triangular Matrix Multiplication) If A,B € IR?*^ 
are upper triangular, then this algorithm computes C = AB. 


C20 
for i= i:n 
for j = i:n 
for k= ij 
C(i j) = Ali, k) B(k, j) + C(i,3) 
end 
end 


end 
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To quantify the savings in this algorithm we need some tools for measuring 
the amount of work. 


1.2.4 Flops 


Obviously, upper triangular matrix multiplication involves less arithmetic 
than when the matrices are full. One way to quantify this is with the notion 
of a flop. A flop! is a floating point operation. A dot product or saxpy 
operation of length n involves 2n flops because there are n multiplications 
and n adds in either of these vector operations. 

The gaxpy y = Ar + y where A € R'?*" involves 2mn flops as does an 
m-by-n outer product update of the form A= A+ zyT. 

The matrix multiply update C = AB + C where A € IR™*?, B c IRP*", 
and C € IR"*" involves 2mnp flops. 

Flop counts are usually obtained by summing the amount of arithmetic 
associated with the most deeply nested statements in an algorithm. For 
matrix-matrix multiplication, this is the statement, 


Cl, j) = Ali, k) B, j) + C(53) 


which involves two flops and is executed mnp times as a simple loop ac- 
counting indicates. Hence the conclusion that general matrix multiplication 
requires 2mnp flops. 

Now let us investigate the amount of work involved in Algorithm 1.2.1. 
Note that cj;, (i < j) requires 2(j — i + 1) flops. Using the heuristics 


and 


pel 
we find that triangular matrix multiplication requires one-sixth the number 
of flops as full matrix multiplication: 


n n n n-il n . n 
YY3236-i«0 = x Xx jx yin E ye z z 


i=] jut iz] j=l im] i=] 


We throw away the low order terms since their inclusion does not contribute 
to what the flop count “says.” For example, an exact flop count of Algo- 
rithm 1.2.1 reveals that precisely n3/3 + n? + 2n /3 flops are involved. For 


!In the first edition of this book we defined a flop to be the amount of work associated 
with an operation of the form aij = a4; + ai204;, i.e., a floating point add, a floating 
point muitiply, and some subscripting. Thus, an “old flop" invoives two “new flopa.” In 
defining a flop to be a single floating point operation we are opting for a more precise 
measure of arithmetic complexity. 
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large n (the typical situation of interest) we see that the exact flop count 
offers no insight beyond the n3/3 approximation. 

Flop counting is a necessarily crude approach to the measuring of pro- 
gram efficiency since it ignores subscripting, memory traffic, and the count- 
less other overheads associated with program execution. We must not infer 
too much from a comparison of flops counts. We cannot conclude, for ex- 
ample, that triangular matrix multiplication is six times faster than square 
matrix multiplication. Flop counting is just a "quick and dirty" accounting 
method that captures only one of the several dimensions of the efficiency 
issue. 


1.2.5 The Colon Notation-Again 


The dot product that the k-loop performs in Algorithm 1.2.1 can be suc- 
cinctly stated if we extend the colon notation introduced in $1.1.8. Suppose 
A € IR?*" and the integers p, q, andr satisfy 1 < p X q X n andl <r € m. 
We then define 
A(r, 1:4) = [arp,..., ae] e IRI PP, 
Likewise, if 1 <p <q € m and 1 € c < n, then 
pe 
A(pac)2]| : | emt. 
age 
With this notation we can rewrite Algorithm 1.2.1 as 
C(ln, kn) =0 
for i= i:n 
for j = in 
Cli, j) = Ali 27) BG, 7) + CG, 7) 
end 
end 
We mention one additional feature of the colon notation. Negative in- 
crements are allowed. Thus, if z and y are n-vectors, then s = zT y(n:— 1:1) 


is the summation n 
s= Y need . 
i=l 


1.2.6 Band Storage 


Suppose A € R'*^ has lower bandwidth p and upper bandwidth q and 
assume that p and q are much smaller than n. Such a matrix can be stored 
in a (p + q+ 1)-by-n array A.band with the convention that 


ag = Aband(i — j +q +1,3) (1.2.1) 
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for all (i, j) that fall inside the band. Thus, if 


A Ü a32 @39 G34 G35 
O a43 A Gas Gag 


then 
0 O a Ga 35 446 
Aband = O 4g G23 434 G45 O56 
Gi) G22 G33 G44 G55 aes 
@21 032 G43 G54 ae O 


Here, the “0” entries are unused. With this data structure, our column- 
oriented gaxpy algorithm transforms to the following: 


Algorithm 1.2.2 (Band Gaxpy) Suppose A € R?*" has lower band- 
width p and upper bandwidth q and is stored in the A.band format (1.2.1). 
Ifz, y € IR", then this algorithm overwrites y with Ar + y. 


for j= i:n 
Ytop = max(1, j — q) 
Yot = min(n, j + p) 
Gtop = max(1,q + 2-7) 
bot = Stop + Yooe — Ytop 
4 y(Yeop:Yoor) = 2(7)A.band (arop:aboe,7) + y(Ytop:Ybot) 
en 


Notice that by storing A by column in A.band, we obtain a saxpy, column 
access procedure. Indeed, Algorithm 1.2.2 is obtained from Algorithm 1.1.4 
by recognizing that each saxpy involves a vector with a small number of 
nonzeros. Integer arithmetic is used to identify the location of these nonze- 
Tos. As a result of this careful zero/nonzero analysis, the algorithm involves 
just 2n(p + q+ 1) flops with the assumption that p and q are much smaller 
than n. 


1.2.7 Symmetry 
We say that A € IR" "is symmetric if AT = A. Thus, 


123 
A2|2465 
3.56 
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is symmetric. Storage requirements can be halved if we just store the lower 
triangle of elements, e.g., Avec=[1 2 3 4 5 6 ]. In general, with 
this data structure we agree to store the a;; as follows: 


ag = Ave(-)m-j6-0/249 G2)  — (22 


Let us look at the column-oriented gaxpy operation with the matrix A 
represented in A.vec. 


Algorithm 1.2.3 (Symmetric Storage Gaxpy) Suppose A € IR?*" is 
symmetric and stored in the A.vec style (1.22). If z,y € IR", then this 
algorithm overwrites y with Az + y. 


for j= lin 
for i= \:j—1 
y(i) = A.vee((i— 1)n — ii — 1)/2 + 2G) + vG) 
end 
for i= jin 
y(i) = A.vec((j — 1)n - 3G — 1)/2 + t)z(y) + y(i) 
end 
end 


This algorithm requires the same 2n? flops that an ordinary gaxpy requires. 
Notice that the halving of the storage requirement is purchased with some 
awkward subscripting. 

1.2.8 Store by Diagonal 

Symmetric matrices can also be stored by diagonal. If 


123 
A=|24 5], 
3 5 6 


then in a store-by-diagonal scheme we represent A with the vector 
Adiag=[1 4 6 2 5 3]. 
In general, if i > j, then 
Gk = A.diag(i--nk — k(k—1)/2) (k20) (1.2.3) 


Some notation simplifies the discussion of how to use this data structure in 
a matrix-vector multiplication. 

if A € IR"*^, then let D(A, k) € R™*” designate the kth diagonal of A 
as foliows: 


gy joitk l<igm, l<jsgn 
panus {a inith tiem si 


22 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS 


Thus, 

123 003 020 

A = 245/=/0 0 0/1+!/00 5 

3.56 000 000 

N 
D(A,2) D(A,1) 

100 000 000 
+/0 4 0/4+/2 00/+/000 
006 050 3.00 
D(A,0} D(A,-1) D(A,-2) 


Returning to our store-by-diagonal data structure, we see that the nonzero 
parts of D(A,0), D(A,1),..., D(A,n — 1) are sequentially stored in the 
A.diag scheme (1.2.3). The gaxpy y = Az + y can then be organized as 
follows: 
n-1 
y = D(A,0)z + Y (D(A, k) + D(AK)T)z + y. 
kui 


Working out the details we obtain the foliowing algorithm. 


Algorithm 1.2.4 (Store-By-Diagonal Gaxpy) Suppose A € IR?*" is 
symmetric and stored in the A.diag style (1.2.3). If z,y € IR^, then this 
algorithm overwrites y with Ar + y. 


for i= lin 
y(i) = A-diagli)z(i) + y(i) 
end 
for k= imn- 1 
= nk ~ k(k — 1)/2 
{y = D(A, k)z + y} 
for i= im—k 
y(i) = A.diag(i + t)z(i + k) + y(i) 
end 
{y = D(A, k)"z + y} 
for i= \n—k 
yli + k) = A.diag(i + t)x(i) + y(i + k) 
end 
end 


Note that the inner loops oversee vector multiplications: 


y(n — k) = A.diag(t + Lit +n — k). x(k + i:n) + y(n — k) 
y(k + iin) = A.diag(t + i:t +n — k). # 2(1in — k) + y(k + i:n) 
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1.2.9 A Note on Overwriting and Workspaces 


An undercurrent in the above discussion has been the economical use of 
storage. Overwriting input data is another way to control the amount of 
memory that a matrix computation requires. Consider the n-by-n matrix 
multiplication problem C = AB with the proviso that the “input matrix” 
B is to be overwritten by the “output matrix” C . We cannot simply 
transform 


Cin, lin) = 0 
for j = i:n 
for k = i:n 
Clg) = C( 3) AG, RBS) 
end 
end 
to 
for j = i:n 
for k =1:n 
B(:,3) = B(,3) + AG, k)B(k, j) 
end 
end 


because B(:, j) is needed throughout the entire k-loop. A linear workspace 
is needed to hold the jth column of the product until it is “safe” to overwrite 
B(:, j): 
for j = lm 
w(in)-0 
for k=1:n 
w(:) = w() + AC, k)B(k, j) 
en 
BC, j) = w() 
end 
A linear workspace overhead is usually not important in a matrix compu- 
tation that has a 2-dimensional array of the same order. 


Problems 


P1.2.1 Give an algorithm that overwrites A with A? where A c R**" is (a) upper 
triangular and (b) square. Strive for a minimum workspace in each case. 

P1.2.2 Suppose A € R"*" is upper Hessenberg and that scalars À1,..., À- are given. 
Give a saxpy algorithm for computing the first column of M = (A — A,J)---(A— Arf). 
P1.2.3 Give a column saxpy algorithm for the n-by-n matrix multiplication problem 
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C = AB where A is upper triangular and B is lower triangular, 

P1.2.4 Extend Algorithm 1.2.2 eo that it can handle rectangular band matrices. Be 
sure to describe tbe underlying data structure. 

P2.2.5 A € R°%" is Hermitian if AĦ = A. If A = B +iC, then it is easy to show that 
BT = B and CT = - C. Suppose we represent A in an array A.herm with the property 
that ALherm(i, j) houses b;; if 1 > j and cij if j >i. Using this data structure write a 
matrix-vector multiply function that computes Re(z) and Im(z) from Re(z) and im(x) 
eo that z = Az. 

P1.2.8 Suppose X € R**? and A € R**", with A symmetric and stored by diagonal. 
Give an algorithm that computes Y = XT AX and stores the result by diagonal. Use 
separate arrays for A and Y. 

P1.2.? Suppose a € R” i» given and that A € R**" has tbe property that ay = 
Qi —j|-1- Give an algorithm that overwrites y with Az + y where z, y € R” are given. 
P1.2.8 Suppose a € R^ is given and that A € R°*" has the property that ay = 
Q((--j-1) mod n)+1- Give an algorithm that overwrites y with Ar + y where z, y € EC 
are given, 

P1.2.8 Develop a compact store-by-diagonal scheme for unsymmetric band matrices 
and write the corresponding gaxpy algorithm. 

P1.2.10 Suppose p and q are n-vectors and that A = (a;;) is defined by ai; = aj; = 949; 
for 1 <i <7 <n. How many flops are required to compute y = Ar where z € R” is 
given? 


Notes and References for Sec. 1.2 
Consult the LAPACK mannal for a discussion about appropriate data structures when 
symmetry and/or bandedness is present, See also 


N. Madsen, G. Roderigue, and J. Karush (1976). “Matrix Multiplication by Diagonals 
on & Vector Parallel Processor," infomation Processing Letters 5, 41-45. 


1.3 Block Matrices and Algorithms 


Having a facility with block matrix notation Is crucial in matrix computa- 
tions because it simplifies the derivation of many central algorithms. More- 
over, “block algorithms" are increasingly important in high performance 
computing. By a block algorithm we essentially mean an algorithm that 
is rich in matrix-matrix multiplication. Algorithms of this type turn out 
to be more efficient in many computing environments than those that are 
organized at a lower linear algebraic level. 


1.3.1 Block Matrix Notation 


Column and row partitionings are special cases of matrix blocking. In 
general we can partition both the rows and columns of an m-by-n matrix 
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A to obtain 
An ... Air m, 
A- | Po | 
An 7 Agr Ma 
ny nr 


where mi +---+ Mma = m, ni + - f, = n, and Agg designates the 
(a, 8) block or submatrix. With this notation, block Aag has dimension 
Ma-by-ng and we say that A = (Aap) is a g-by-r block matrix. 


1.3.2 Block Matrix Manipulation 

Block matrices combine just like matrices with scalar entries as long a8 
_certain dimension requirements are met. For example, if 
Bu... Bir mi 
B= : : 
Ba cU Bee T 
ny Ny 


then we say that B is partitioned conformably with the matrix A above. 
The sum C = A+ B can also be regarded as a g-by-r block matrix: 
Cu e C. AuctBu c5 ABB. 
c=] : 2 f=]: 
Ca c C Agi + Bay c Age + Bor 
The multiplication of block matrices is a little trickier. We start with a pair 
of lemmas. 


Lemma 1.3.1 If A € IR^"^, B e BPX", 


A T 
A- B-[B,..ER], 
Ag ma m ne 
then 
Ch... Cie mi 
AB= C= : 


Cn | Con ma 


where Cag = ÅaBp fora = 1:9 and B = Lr, 
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Proof. First we relate scalar entries in block Cag to scalar entries in C. 
Forl<a<q,1<¢f8<r,1<i¢ ma, and1<j <ng we have 


(Cag]; = Otioty 
where 


= mote + Ma- 
m ee ng-. 


But 


P P 
Catit = > Oatnrdewsy = >, [Malis Bal, = AaBel.;- 
kel 


k=) 
s 


Thus, Cag = Aa Bg. D 


Lemma 1.3.2 If A € R™?, B c p?*^, 


B n 
A= [A...4] ,and B= : . 
Pr Ps B, Ds 
then a 
AB = C = S AB. 
4-1 


Proof. We set s — 2 and leave the general s case to the reader. (See 
P13.6. For 1 <i < m and 1 <j < n we have 


p p Pitpa 
e; = J marbi = P aab + YO sab 
kel kal kept 


[A Bil; + (4383; = [4B + AsBi];. 
Thus, C = A,B, + AgB5. D 


For general block matrix multiplication we have the foliowing result: 
Theorem 1.3.3 If 
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and we partition the product C = AB as follows, 


C= : : ; 


then , 
Cap = L AayBap a=lq f-lr. 
yml 
Proof. See P1.3.7. O 


A very important special case arises if we set s = 2, r = 1, and n = E: 


An An 2) _ | Anzi + Aaz22 
An An T3 Ant + Agere | ° 


This partitioned matrix-vector product is used over and over again in sub- 
sequent chapters. 


1.3.8 Submatrix Designation 
As with "ordinary" matrix multiplication, block matrix multiplication can 
be organized in several ways. To specify the computations precisely. we 
need some notation. 
Suppose A € IR"*" and that i = (41,...,4-) and j = (Ju... X) are 

integer vectors with the property that 

het € {1,2,...,m} 

Fiseesde € {1,2,..., 7}. 
We let A(i, 7) denote the r-by-c submatrix 


Aliji) e Alije) 
A(i, j) = : : 
Ash) c AG) 
If the entries in the subscript vectors i and j are contignous, then the 
"colon" notation can be used to define A(:, j) in terms of the scalar entries 
in A. In particular, 1f 1 < i, € ig € m and 1 € j < je € n, then 
A(à:i2, j1:j2) is the submatrix obtained by extracting rows i, through i; 
and columns jı through jo, e.g, 


431 032 
A(3:5, 1:2) = | as aa |. 
451 452 
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While on the subject of submatrices, recall from $1.1.8 that if i and j are 
scalars, then A(i,:) designates the ith row of A and A(:,7) designates the 
jth column of A. 


1.3.4 Block Matrix Times Vector 


An important situation covered by Theorem 1.3.3 is the case of a block 
matrix times vector. Let us consider the details of the gaxpy y = Ar+y 
where A € IR™*", z c IR^, y € IR", and 


A mı n mi 


, Aq | ma Yo | ma 
We refer to A; as the ith block row. If m.vec = (mi,...,mq) is the vector 
of block row “heights”, then from 


y A y 
: = : rc : 
i Ya Ag Yq 
we obtain 
last = 0 
for i = lig 
first =last+1 
last = first + m.vec(i) —1 (1.3.1) 
y(first:last) = A(first:last,:)z + y(firstlast) 
end 


Esch time through the loop an "ordinary" gaxpy is performed so Algorithms 
1.1.3 and 1.1.4 apply. 

Another way to block the gaxpy computation is to partition A and z as 
follows: 


A= [A4] z= |: 
ni Ny Tr ne 


In this case we refer to A; as the jth block column of A. If n.vec = 
(m1,..., n.) is the vector of block column widths, then from 
Tı r 
y [A Ar] ] : +y = Amy 
Er jm) 


we obtain 
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last = 0 
for j = 1L:r 
first = last - 1 
last = first + n.vec(j) - 1 (1.3.2) 


y = A(;, first:last)z(first:last) + y 
end 


Again, the gaxpy’s performed each time through the loop can be carried 
out with Algorithm 1.1.3 or 1.1.4. 


1.3.5 Block Matrix Multiplication 


Just as ordinary, scalar-level matrix multiplication can be arranged in sev- 
eral possible ways, so can the multiplication of block matrices. Different 
blockings for A, B, and C can set the stage for block versions of the dot 
product, saxpy, and outer product algorithms of $1.1. To illustrate this 
with a minimum of subscript clutter, we assume that these three matrices 
are all n-by-n and that n = N£ where N and £ are positive integers. 

If A = (Aag), B = (Bag), and C = (Cag) are N-by-N block matrices 
with £-by-£ blocks, then from Theorem 1.3.3 


N 
Cop = S AaBs + Cop = @=L:N, B=LN, 
m 
If we organize a matrix multiplication procedure around this summation, 
then we obtain a block analog of Algorithm 1.1.5: 


fora - LN 
i-(a—1)X0-Lot 
for 6 =1:N 
J=(-104+ 1:88 (1.3.3) 
for y =1:N 
k=(y-Dé+ ly 
C(é,7) = A(i, kK)B(k,j) + C(7) 
end 
end 
end 


Note that if £ = 1, then a =i, f = j, and y = k and we revart to Algorithm 
1.1.5. 
To obtain a block saxpy matrix multiply, we write C = AB + C as 
Bn > Bw 
[Cis On ]=[ 4, Aw] ] : c 0: tl Ci Ow 
Bwi > ByN 


where Ag, Ca € IR**, and Bag € IRÉ'*, From this we obtain 
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for g- LN 
j-2(8—-10£- 14 
for a = LN 
i = (a—1)é+ liad (1.3.4) 
Ch: j) = A(51)B(, 7) + CO, 3) 
end 
end 


This is the block version of Algorithm 1.1.7. 
A block outer product scheme results if we work with the blockings 


Br 
A =[A1,..-, Aw] B = 1 
BN 


where Ay, B, € IR"**. From Lemma 1.3.2 we have 


N 
C- Y AB? +C 
Tb 
and so 

for y = LN 

k= (y—-1)é+ levyé 

CH=A(,k)B(k,:)+C (1.3.5) 
end 


This is the block version of Algorithm 1.1.8. 


1.3.6 Complex Matrix Multiplication 
Consider the complex matrix multiplication update 
Ci iC$ = (Ai + iAg)(Bi +iB2) + (C1 + iC3) 


where all the matrices are real and i? = —1. Comparing the real and 

imaginary parts we find 
e 
Cy 


Ai By — AgBo + Ci 
Ai Bo + A3B4 + C4 


and this can be expressed as follows: 


[&]-I8 IIS HIS]: 
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This suggests how real matrix software might be applied to solve complex 
matrix problems, The only snag is that the explicit formation of 


j| Ai -A 
i-]2 a | 


requires the “double storage” of the matrices A, and A2. 


1.3.7 A Divide and Conquer Matrix Multiplication 


We conclude this section with a completely different approsch to the matrix- 
matrix multiplication problem, The starting point in the discussion is the 
2-by-2 block matrix multiplication 


Cu Ca | [4u Ai] [ Bu Biz 
Cy Cm An Anz By Bag 


where esch block is square. In the ordinary algorithm, Cy = Aa B; + 
A;2B2;. There are 8 multiplies and 4 adds. Strassen (1969) has shown how 
to compute C with just 7 multiplies and 18 adds: 


PO = (An + A22)(Bii + Baa) 
Py = (An + Avz) Bir 

Pj = Ati (Bi2 — Boa) 

P = An(Ba- Bu) 

P = (An +An)Bz 

PF& = (An-Au)(Bu + Biz) 
P, = (An — Azn)(Bn + Boa) 
Cu = B-BA-PEPE 

Ca = Rh+P 

Cna = BEP 

Cn = P+R -P+ Pe 


These equations are easily confirmed by substitution. Suppose n = 2m so 
that the blocks are m-by-m. Counting adds and multiplies in the compu- 
tation C = AB we find that conventional matrix multiplication involves 
(2m)? multiplies and (2m)? — (2m)? adds. In contrast, if Strassen’s al- 
gorithm is applied with conventional multiplication at the block level, then 
Tm? multiplies and 7mm? + 11m? adds are required. If m >> 1, then the 
Strassen method involves about 7/8ths the arithmetic of the fully conven- 
tional algorithm. 

Now recognize that we can recur on the Strassen idea. In particular, we 
can apply the Strassen algorithm to each of the half-sized block multiplica- 
tions associated with the Pj. Thus, if the original A and B are n-by-n and 
n = 2°, then we can repeatedly apply the Strassen multiplication algorithm. 
At the bottom “level,” the blocks are 1-by-1. Of course, there is no need to 
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recur down to the n = 1 level. When the block size gets sufficientiy small, 
(n < nmin), it may be sensible to use conventional matrix multiplication 
when finding the P; . Here is the overall procedure: 


Algorithm 1.3.1 (Strassen Multiplication) Suppose n = 2" and that 
AER”? and BE R'*^, If nmin = 2¢ with d < q, then this algorithm 
computes C = AB by applying Strassen procedure recursively q — d times. 


function: C = strass(A, B,n, nmin) 

if n € nmin 
C= AB 

else 
m -n/2;u = hm;v = m + l:n; 
P, = strass(A(u, u) + A(v, v), B(u, u) + B(v, v), mnes) 
P, = strass(A(v, u) + A(v. v), B(u, u), m, nmin) 
Ps = strass(A(u,u), B(u, v) — B(v, v), m, nmin) 
P, = strass(A(v, v), B(v, u) — B(u, u), m nm) 
Ps = strass(A(u, u) + A(u, v), B(v. v), m, nmin) 
Ps = strass(A(v, u) ~ A(u,u), B(u, u) + B(u, v), m nmin) 
P = strass(A(u, v) — A(v, v), B(v,u) + B(v,v), m, nmin) 
C(uu) = Pi + Pa — Ps + Pr 
Clu, v) = Ps + Ps 
Clo, u) = Pj P 
C(u,v) = Pi + Py — Pi Ps 

end 


Unlike any of our previous algorithms strass is recursive, meaning that 
it calls itself. Divide and conquer algorithms are often best described in 
this manner. We have presented this algorithm in the style of a MATLAB 
function so that the recursive calls can be stated with precision. 

The amount of arithmetic associated with strass is a complicated func- 
tion of n and nmin. If nmin > 1, then it suffices to count multiplications 
as the number of additions is roughly the same. If we just count the mul- 
tiplications, then it suffices to examine the deepest level of the recursion 
as that is where all the multiplications occur. In strass there are q — d 
subdivisions and thus, 7974 conventional matrix-matrix multiplications to 
perform. These multiplications have size nmin and thus strass involves 
about s = (24)379-4 multiplications compared toc = (2°), the number 
of multiplications in the conventional approach. Notlce that 


eO 
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If d = 0 , i.e., we recur on down to the 1-by-1 level, then 


7 q 
s= (3) € - T me nh] oy 2 07, 
Thus, asymptotically, the number of multiplications in the Strassen proce- 
dure is O(n?-8°7), However, the number of additions (relative to the number 
of multiplications) becomes significant as nmin gets small. 


Example 2.3.2 lí 5 = 1024 and nmin = 64, then strass involves (7/8)1079 z; .S the 
arithmetic of the conventional algorithm. 


Problems 


P1.3.1 Generalize (1.3.3) so tbat it can bandle the variable block-size problem covered 
by Theorem 1.3.3, ` 


P1.3.2 Generalize (1.3.4) and (1.3.5) so that they can handle the variable block-size 
case. 

P1.3.3 Adapt strass so that it can handle square matrix multiplication of any order. 
Hint; If the “current” A has odd dimension, append a zero row and column. 

P1.3.d Prove tbat if 


An cv Ar 
A= H eo 1 
An scs Age 
is a blocking of the matrix A, then 
Ah AD 
ATs : . i . 
Aoc 


P1.3.5 Suppose n is even and define the following function from R” to R: 
nj? 
F(z) = 2(1:2n)Tz(2:n) = onem 
1a 
(a) Show that if z, y € R?” then 
nj? 
aly = S (aci tye Men vai) - Fe) - Fy) 
$21 
(b) Now consider the n-by-n matrix multiplication C = AB, Give an algorithm for 


computing this product tbat requires n? /2 multiplies once f is applied to the rows of A 
and the columns of B. See Winograd (1968) for details. 


P2.3.6 Prove Lemma 1.3.2 for general s. Hint, Set 
Py = Pr tee + pyi y= haet+l 


and show tbat 
s Pai 


ay = L X dikbkj: 
Tul kempy +1 
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P1.3.7 Use Lemmas 1.3.1 and 1.3.2 to prove Thsorem 1.3.3. In particular, set 


Ay = : and B= [| By e. Bar | 
Ae 
and note from Lemma 1.3.2 tbat 


C= PAB. 
y=1 


Now analyze each A,B, with the help of Lemma 1.3.1. 


Notes and References for Sec. 1.3 


Fur quite some time fast methods for matrix multiplication have attracted a iot of at- 
tention within computer science. See 


S. Winogred (1968). “A New Algorithm for Inner Product,” JEEE Trans. Comp. C:17, 
693-694. 

V, Strassen (1969), “Gaussian Elimination is Not Optimal,” Numer. Math, 13, 354-356. 

V. Pan (1984). "How Can We Speed Up Matrix Multiplication?,” SIAM Repiew 26, 
393-416. 


Many of these methods have dubious practical value. However, with the publication of 


D. Bailey (1988). “Extra High Speed Matrix Multiplication on the Cray-2,” SIAM J. 
Sei. and Stat. Comp. 9, 603-607. 


it is clear that the blanket dismissal of these fast procedures is unwise. The “stability” 
of the Strassen algorithm is discussed in $2.4.10. See also 


N.J. Higham (1990). “Exploiting Fast Matrix Multiplication within the Level 3 BLAS,” 
ACM Trans, Math. Soft. 16, 352-368, 

C.C. Douglas, M. Heroux, G. Slishman, and R.M. Smith (1994). “GEMMW: A Portable 
Level 3 B) Winograd Variant of Strassen’s Matrix-Matrix Multiply Algorithm,” 
J. Comput. Phys. 110, 1-10. 


1.4 Vectorization and Re-Use Issues 


The matrix manipulations discussed in this book are mostly built upon 
dot products and saxpy operations. Vector pipeline computers are able 
to perform vector operations such as these very fast because of special 
hardware that is able to exploit the fact that a vector operation is a very 
regular sequence of scalar operations. Whether or not high performance 
is extracted from such a computer depends upon the length of the vector 
operands and a number of other factors that pertain to the movement of 
data such as vector stride, the number of vector loads and stores, and 
the level of data re-use. Our goal is to build a useful awareness of these 
issues. We are not trying to build a comprehensive model of vector pipeline 
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computing that might be used to predict performance. We simply want to 
identify the kind of thinking that goes into the design of an effective vector 
pipeline code. We do not mention any particular machine. The literature 
is filled with case studies. 


1.4.1 Pipelining Arithmetic Operations 


The primary reason why vector computers are fast has to do with pipelin- 
ing. The concept of pipelining is best understood by making an analogy to 
assembly line production. Suppose the assembly of an individual automo- 
bile requires one minute at each of sixty workstations along an assembly 
line. If the line is well staffed and able to initiate the assembly of a new car 
every minute, then 1000 cars can be produced from scratch in about 1000 
+ 60 = 1060 minutes. For a work order of this size the line has an effective 
"vector speed" of 1000/1060 automobiles per minute. On the other hand, 
if the assembly line is understaffed and a new assembly can be initiated 
just once an hour, then 1000 hours are required to produce 1000 cars. In 
this case the line has an effective "scalar speed" of 1/60th automobile per 
minute. 

So it is with a pipelined vector operation such as the vector add z = z--y. 
The scalar operations z; = z; + y; are the cars. The number of elements 
is the size of the work order. If the start-to-finish time required for each 
zi is r, then a pipelined, length n vector add could be completed in time 
much less than nr. This gives vector speed. Without the pipelining, the 
vector computation would proceed at a scalar rate and would approximately 
require time nr for completion. 

Let us see how a sequence of floating point operations can be pipelined. 
Floating point operations usually require several cycles to complete. For 
example, a 3-cycle addition of two scalars r and y may proceed as in 
Fic.1.4.1. To visualize the operation, continue with the above metaphor 


z Adjust 


Fic. 1.4.1 A 3-Cycle Adder 


and think of the addition unit as an assembly line with three “work sta- 
tions". The input scalars z and y proceed along the assembly line spending 
one cycle at each of three stations. The sum z emerges after three cycles. 
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Fic. 1.4.2 Pipelined Addition 


Note that when a single, “free standing” addition is performed, only one of 
the three stations is active during the computation. 

Now consider a vector addition z = x+y . With pipelining, the z and y 
vectors are streamed through the addition unit. Once the pipeline is filled 
and steady state reached, a z; is produced every cycle. In Fic.1.4.2 we 
depict what the pipeline might look like once this steady state is achieved. 
In this case, vector speed is about three times scalar speed because the time 
for an individual add is three cycles. 


1.4.2 "Vector Operations 


A vector pipeline computer comes with a repertoire of vector instructions, 
such as vector add, vector multiply, vector scale, dot product, and saxpy. 
We assume for clarity that these operations take place in vector registers. 
Vectors travel between the registers and memory by means of vector load 
and vector store instructions. 

An important attribute of a vector processor is the length of its vector 
registers w. we designate by v,. A length-n vector operation must be 
broken down/into subvector operations of length v,or less. Here is how such 
a partitioning might be managed in the case of a vector addition z = z + y 
where z and y are n-vectors: 


first zl 
while first <n 
last = min{n, first + v, — 1} 
Vector load z(first:last). 
Vector load y( first:last). 
Vector add: z(first:last) = z(first:last) + y(first:last). 
Vector store z( first:last). 
first 2 last +1 
end 


A reasonable compiler for a vector computer would automatically generate 
these vector instructions from a programmer specified z = z + y command. 
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1.4.3 The Vector Length Issue 


Suppose the pipeline for the vector operation op takes Top cycles to “set 
up." Assume that one component of the result is obtained per cycle once 
the pipeline is filled. The time required to perform an n-dimensional op is 
then given by 
Top(n) = (rop + n)a nso 

where y is the cycle time and v; is the length of the vector hardware. 

If the vectors to be combined are longer than the vector hardware length, 
then as we have seen the overall vector operation must be broken down into 
hardware-manageable chunks. Thus, if 


n= niu, + rio O<Snm<u, 
then we assume that 
Top(n) = ni(Top + vL)A no =0 
7 (m(Top +z) Top +70) — no FO 


specifies the overall time required to perform a length-n op. This simplifies 
to 
Top(n) = (n + ropceil(n/vi)) u 

where ceil(a) is the smallest integer such that a S ceil(a). If p flops per 
component are involved, then the effective rate of computation for general 
n is given by 

1 
1+ ceil (2) 


(If u is in seconds, then Rop is in flops per second.) The asymptotic rate of 
performance is given by 

li __1 Pp 

lim, Rop (nm) T ip u` 

UL 
As a way of assessing how serious the start-up overhead is for a vector 

operation, Hockney and Jesshope (1988) define the quantity n;;5 to be the 
smallest n for which half of peak performance is achieved, i.e., 


Eum. 18 

Top(nij2) 2p ` 
Machines that have big nı;2 factors do not perform well on short vector 
operations. 

Let us see what the above performance model says about the design 
of the matrix multiply update C = AB + C where A € IR^*?, B e IRP*^, 
and C € K'"*^. Recall from $1.1.11 that there are six possible versions of 
the conventional algorithm and they correspond to the six possible loop 
orderings of 
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for i = lim 
for j = Ln 
for k = lp 
C (i,j) = AG, k) BER, 5) + C(i, j) 
end 
end 
end 


This is the ¿jk variant and its innermost loop oversees a length-p dot prod- 
uct. Thus, our performance model predicts that 


Tijk = mnp + mn - ceil(p/v;)ras 


cycles are required. A similar analysis for each of the other variants leads 
to the following table: 


mnp + mn: Tasl(p/vi) 
mnp + mn «Tao (p/ v.) 


mnp + MP- T,sz(n/v.) 
map + np Tsaz(m/vz) 
mnp + Mp - T,os(n/v.) 
mnptnp: Tsaz (m/z) 


We make a few observations based upon some elementary integer arithmetic 
manipulation. Assume that 7,4; and Tyo; are roughly equal. If m, n, and 
p are all less than v,, then the most efficient variants will have the longest 
inner loops. If rn, n, and p are much bigger than v,, then the distinction 
between the six options is small. 


1.4.4 The Stride Issue 


The “layout” of a vector operand in memory often has a bearing on exécu- 
tion speed. The key factor is stride. The stride of a stored floating point 
vector is the distance (in logical memory locations) between the vector’s 
components. Accessing a row in a two-dimensional Fortran array is not a 
unit stride operation because arrays are stored by column. In C, it is just 
the opposite as matrices are stored by row. Nonunit stride vector opera- 
tions may interfere with the pipelining capability of a computer degrading 
performance. 

To clarify the stride issue we consider how the six variants of matrix 
multiplication “pull up” data from the A, B, and C matrices in the inner 
loop. This is where the vector calculation occurs (dot product or saxpy) 
and there are three possibilities: 
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jki or kji: for i = lim 
Ci, j) = Cli, j) + AG, k) B(k, j) 
end 
tkj or kij: for j = Ln 
C(i,j) = C(,3) + Ali, k)B(k, j) 
end 
ijk or jik: for k = l:p 
C(i j) = Cli, j) + AGG, k)B(k, 7) 
end 


Here is a table that specifies the A, B, and C strides associated with each 
of these possibilities: 


jki or kji Unit 0 Unit 
tkj or kij 0 Non-Unit | Non-Unit 
ijk or jik | Non-Unit Unit 0 


Storage in column-major order is assumed. A stride of zero means that only 
a single array element is accessed in the inner loop. From the stride point 
of view, it is clear that we should favor the jk and kji variants. This may 
not coincide with a preference that ís based on vector length considerations. 
Dilemmas of this type are typical in hlgh performance computing. One goal 
(maximize vector length) can confilct with another (impose unit stride). 
Sometimes a vector stríde/ vector length conflict can s be resolved through 

the intelligent choice of data structures. Consider the gaxpy y = Az + y 
where A € IR^** is symmetric. Assume that n < v, for simplicity. If 
A is stored conventionally and Algorithm 1.1.4 is used, then the central 
computation entails n, unit stride saxpy's each having length n: 


for j = l:n 
y= AQ, jz) +y 
ern 


Our simple execution model tells us that 
Tj = n(teaz +) 
cycles are required. 


In 81.2.7 we introduced the lower triangular storage scheme for sym- 
metric matrices and obtained this version of the gaxpy: 
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for j = Ln 
for i= 1:j-1 
y(t) = Avec((i - 1)n - i(i — 1)/2 3) ) + yC) 
end 
for i= jin 
y(i) = Avec((j — 1)n - 3G — 1)/2 + izli) + y(i) 
end 
end 


Notice that the first i-loop does not define a unit stride saxpy. If we assume 
that a length n, nonunit stride saxpy is equivalent to n unit-length saxpys 
(a worst case scenario), then this implementation involves 


Ta-n (Frees +n) 


cycles. 
In §1.2.8 we developed the store-by-diagonal version: 


for i = lin 
y(i) = A-diag(i)z(i) + y(i) 
end 
for k=1m-1 
t=nk—~k(k-1)/2 
(y = D(A.k)z +y} 
for i= lin -k 
yli) = A.diag(i  t)x(i + k) + y(i) 


en 
(y = D(A kz + y} 
for i =lin—-—k 
y(i + k) = A.diag(i + t)x(i) + y(i + k) 
end 
end 


In this case both inner loops define a unit stride vector multiply (vm) and 
our model of execution predicts 


T3 =R (29 +2) 


cycles. 

The example shows how the choice of data structure can effect the stride 
attributes of an algorithm. Store by diagonal seems attractive because ít 
represents the matrix compactly and has unit stride. However, a careful 
whlch-is-best analysis would depend upon the values of 7,4; and Tym and 
the precise penalties for nonunit stride computation and excess storage. 
The complexity of the situation would call for careful benchmarking. 
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1.4.5 Thinking About Data Motion 


Another important attribute of a matrix algorithm concerns the actual vol- 
ume of data that has to be moved around during execution. Matrices sit 
in memory but the computations that involve their entries take place in 
functional units. The control of memory traffic is crucial to performance 
in many computers. To continue with the factory metaphor used at the 
beginning of this section: Can we keep the superfast arithmetic units busy 
with enough deliveries of matriz data and can we ship the results back to 
memory fast enough to avoid backlog? F1G.1.4.3 depicts the typical situa- 
tion in an advanced uniprocessor environment. Details vary from machine 


Fic. 1.4.3 Memory Hierarchy 


to machine, but two “axioms;; prevail: 


* Esch level in the hlerarchy has a limited capacity and for economic 
reasons this capacity is usually smaller as we ascend the hlerarchy. 


» There is a cost, sometimes relatively great, associated with the moving 
of data between two levels in the hierarchy. 


The design of an efficient matrix algorithm requires careful thinking about 
the flow of data in between the various levels of storage. The vector touch 
and data re-use issues are important in this regard. 


1.4.6 The Vector Touch Issue 


In many advanced computers, data is moved around in chunks, e.g., vectors. 
The time required to read or write a vector to memory is comparable to 
the time required to engage the vector in a dot product or saxpy. Thus, the 
number of vector touches associated with a matrix code is a very important 
statistic. By a “vector touch" we mean either a vector load or store. 
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Let’s count the number of vector touches associated with an m-by-n 
outer product. Assume that m = mv, and n = niv, where vis the vector 
hardware length. (See §1.4.3.) In this environment, the outer product 
update A = A + zyT would be arranged as follows: 


for a = lim 
i = (a ~ 1)v, + Liev, 
for g-Lkn 
j = (B - 1, + bpv, 
AG, j) = AG, j) + zy )7 
end 
end 


Each column of the submatrix A(i,j) must be loaded, updated, and then 
Stored. Not forgetting to account for the vector touches associated with z 
and y we see that approximately 


Y ( + Yu + 2».)) as 2min 


azl Bl 


vector touches are required. (Low order terms do not contribute to the 

Now consider the gaxpy update y = Ar +y where y € IR^, z € IR" and 
Ae R™*". Breaking this computation down into segments of length v, 
gives 


for a = lim 
i = (a ~ Du, + liave 
for 6 = lm 
j= (8-Huy, + fv. 
v6) = v() + Ali, i)z) 
end 
end 


Again, each column of submatrix A(i, j) must be read but the only writing 
to memory involves subvectors of y. Thus, the number of vector touches 
for an m-by-n gaxpy is 


m nm 

L (2+ Da +w) = mn. 

azl 8-1 

This is half the number required by an identically-sized the outer product. 
Thus, if a computation can be arranged in terms of either outer products 
or gaxpys, then the former is preferable from the vector touch standpoint. 
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1.4.7 Blocking and Re-Use 


A cache is a small high-speed memory situated in between the functional 
units and main memory. See F1G.1.4.3. Csche utilization colors perfor- 
mance because it has a direct bearing upon how data flows in between the 
functional units and the lower levels of memory. 

To illustrate this we consider the computation of the matrix multiply 
update C = AB + C where A, B,C € IR"** reside in main memory. All 
data must pass through the cache on its way to the functional units where 
the floating point computations are carried out. If the cache is small and 
n is big, then the update must be broken down into smaller parts so that 
the cache can “gracefully” process the flow of data. 

One strategy is to block the B and C matrices, 


[ By... By J C= [0 Cx] 
t t t t 
where we assume that n — £N. From the expansion 


Ca = ABa C, = 3 A k)Balk,:) + Ca 
k=l 


we obtain the following computational framework: 


for a=1:N 
Load Ba and C, into csche. 
fork- Ln 
Load A(:, k) into csche and update Ca: 
Ca = A(:,k) Balk, :) + Ca 
end 
Store C, in main memory. 


Note that if M is the cache size measured in floating point words, then we 
must have 
2n£ - n € M. (1.4.1) 


Let T, be the number of floating point numbers that flow (in either direc- 
tion) between cache and main memory. Note that every entry in B is loaded 
into cache once, every entry in C is loaded into cache once and stored back 
in main memory once, and every entry in A is loaded into cache N = n/£ 
times. It follows that 


3 
EVE: 
DT; 235 Uy 


"The discussion which follows would also apply if the matrices were on a disk and 
needed to be brought into main memory. 
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In the interest of keeping data motion to a minimum, we choose £ to be as 
large as possible subject to the constraint (1.4.1). We therefore set 


obtaining 


(We use “=” to emphasize the ap proximate nature of our analysis.) If csche 
is large enough to house the entire B and C matrices with room left over 
for a column of A, then £ = n and T, = 4n?. At the other extreme, if we 
can just fit three columns in cache, then £ = 1 and T, = n?. 

Now let us regard A = (Aag) , B = (Bag), and C = (Cag) as N-by-N 
block matrices with uniform block size / = n/N. With this blocking the 
computation of 


N 
Cas =$ AayBy = @ = 1:N, B= LN 
yal 


can be arranged as follows: 


for a =1:N 
for B=1:N 
Load Cag into csche. 
for y= LN 
Load Ag, and Bg into cache. 
Cag = Cag + AayBrg 
end 
Store Ceg in main memory. 
end 
end 


In this case the main memory /cache traffic sums to 


3 
T2 = 2n? + 2n 
£ 
because each entry in A and B is loaded N = n/é times and each entry 
in C is loaded once and stored once. We can minimize this by choosing £ 
to be as large as possible subject to the constraint that three blocks fit in 
cache, ie., 


38 <M 
Setting £ ~ ,/M/3 gives 


13 
T 2 A. 
2 2: 2n" + 2n M 
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A manipulation shows that 


n, e x*s 


3422 
l.l Mon 5 CUM. 
T? an? + 28 / 3 2424/8 [5 


The key quantity here is n?/ M, the ratio of matrix size (in floating point 
words) to cache size. As this ratio grows the we find that 


Don 
I2; 3M 


showing that the second blocking strategy is superior from the standpoint 
of data motion to and from the cache. The fundamental conclusion to be 
reached from all of this is that blocking effects data motion. 


1.4.8 Block Matrix Data Structures 


We conclude this section with a discussion about block data structures. A 
programming language that supports two-dimensional arrays must have a 
convention for storing such & structure in memory. For example, Fortran 
stores two-dimensional arrays in column major order. This means that the 
entries within a column are contiguous in memory. Thus, if 24 storage 
locations are allocated for A € IRÍ*5, then in traditional store-by-column 
format the matrix entries are "lined up" in memory as depicted in FiG. 
1.4.4. In other words, if A € R™*" is stored in v(1:mn), then we identify 


FiG. 1.4.4 Store by Column (4-by-6 case) 


A(i, j) with v(( — 1)m +i). For algorithms that access matrix data by 
column this is a good arrangement since the column entries are contiguous 
in memory. 
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Fic. 1.4.5 Store-by-Blocks (4-by-6 case with 2-by-2 Blocks) 


In certain block matrix algorithms it is sometimes useful to store matri- 
ces by blocks rather than by column. Suppose, for example, that the matrix 
A above is a 2-by-3 block matrix with 2-by-2 blocks. In a store-by-column 
block scheme with store-by-column within each block, the 24 entries are 
arranged in memory as shown in Fic. 1.4.5. This data structure can be 
attractive for block algorithms because the entries within a given block are 
contiguous in memory. 


Problems 


P1.4.1 Consider the matrix product D = ABC where A€ R™*" , B c R'*" and 
CecR'**, Assume that all the matrices are stored by column and that the time required 
to execute a unit-stride saxpy operation of length k is of the form t(k) = (b--k)u where L 
is a constant and y is the cycle time. Based on this model, when is it more economical to 
compute D ae D = (AB)C instead of as D = A( BC)? Asaume that all matrix multiplies 
are doue using the jki, (gaxpy) algorithm. 

P1.4.2 What is the total time spent in jki variant on the saxpy operations assuming 
that all the matrices are stored by column and that the time required to execute a unit- 
stride saxpy operation of length k is of the form i(k) = (L + k)u where L is a constant 
and p is the cycle time? Specialize the algorithm so that it efficiently handles the case 
when A and B are n-by-n and upper triangular. Does it follow that the triangular 
implementation is six times faster as the flop count suggests? 

P1.4.3 Give an algorithm for computing C = AT BA where A and B are n-by-n and 
B is symmetric. Arrays should he accessed in unit stride fashion within all innermost 
loops. 

P1.4.4 Suppose A € R™*" is stored by column in A.col(1:mn). Assume that m = £1 M 
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and n = /2N and that we regard A as an M-hy-N block matrix with £1-by-£2 blocks. 
Given i, j, a, and B that satisfy 1 <i Sh, 1 Sj Sh, 1 «a € M andi <8 & N 
determine k so that A.col(k) houses the (i,j) entry of Aap. Give an algorithm that 
overwrites A.col with A stored by block as in Figure 1.4.5. How big of a work array is 
required? 


Notes and References for Sec. 1.4 
Two excellent expositions about vector computation are 


J.J. Dongarra, F.G. Gustavson, and A. Karp (1984). "Implementing Linear Algebra 
Algorithms for Dense Matrices on a Vector Pipeline Machine," SIAM Review 26, 
91-112, 

LM. Ortega and R.G. Voigt (1985). “Solution of Partial Differential Equations on Vector 
and Parallel Computers,” SIAM Review 27, 149-240. 


A very detailed look at matrix computations in hierarchical memory systems can be 
found in 


K. Gallivan, W. Jalby, U. Meier, and A.H. Sameh (1988). "Impact of Hierarchical Mem- 
ory Systems on Linear Algebra Algorithm Design," /nt'l J. Supercomputer Applic. 
2, 12-48, 


See also 


W. Schónauer (1987). Scientific Computing on Vector Computers, North Holland, Am- 
sterdam- 

R.W. Hockney and C.R. Jesshope (1988). Paraile! Computers 2, Adam Hilger, Bristol 
and Philadelphia. 

where varioue models of vector processor performance are set forth. Papers on the prac- 

tical aspects of vector computing include 


J.J. Dongarra and A. Hinda (1979). “Unrolling Loops in Fortran,” Software Practice 
and Experience 9, 219-229. 

J.J. Dongarra and S. Eisenstat (1984). “Squeezing the Most Out of an Algorithm in 
Cray Fortran,” ACM Trans. Math. Soft. 10, 221-230. 

B.L. Buzbee (1986) “A Strategy for Vectorization,” Parailel Computing 3, 187-192. 

K. Gallivan, W. Jalby, and U. Meier (1987). “The Use of BLAS3 in Linear Algebra on a 
Parallel Processor with a Hierarchical Memory,” SIAM J. Sci. and Stat. Comp. 8, 
1079-1084. 

J.J. Dongarra and D. Walker (1995). "Software Libraries for Linear Algebra Computa- 
tions on High Performance Computers,” SIAM Review 37, 151-180. 


Chapter 2 


Matrix Analysis 


$2.1 Basic Ideas from Linear Algebra 

$2.2 Vector Norms 

§2.3 Matrix Norms 

§2.4 Finite Precision Matrix Computations 
$2.5 Orthogonality and the SVD 

$2.6 Projections and the CS Decomposition 
$2.7 The Sensitivity of Square Linear Systems 


The analysis and derivation of algorithms in the matrix computation 
area requires a facility with certain aspects of linear algebra. Some of the 
basics are reviewed in $2.1. Norms and their manipulation are covered in 
$2.2 and $2.3. In $2.4 we develop a model of finite precision arithmetic and 
then use it in a typical roundoff analysis. 

The next two sections deal with orthogonality, which has a prominent 
role to play in matrix computations. The singular value decomposition 
and the CS decomposition are a pair of orthogonal reductions that provide 
critical insight into the important notions of rank and distance between 
subspaces. In $2.7 we examine how the solution of a linear system Ar = 
b changes if A and 6 are perturbed. The important concept of matrix 
condition is introduced. 


Before You Begin 
References that complement this chapter include Forsythe and Moler 


(1967), Stewart (1973), Stewart and Sun (1990), and Higham (1996). 
2.1 Basic Ideas from Linear Algebra 


This section is & quick review of linear algebra. Readers who wish a more 
detailed coverage should consult the references at the end of the section. 


ae 
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2.1.1 Independence, Subspace, Basis, and Dimension 


A set of vectors {a1,...,@n} in IR” is linearly independent if 577 ., aja; = 0 
implies a(1:n) = 0. Otherwise, a nontrivial combination of the a; is zero 
and {a),...,@n} is said to be linearly dependent . 

A subspace of IR™ is a subset that is also a vector space. Given a 
collection of vectors a1,...,@, € IR”, the set of all linear combinations of 
these vectors is a subspace referred to as the span of {a1,...,an}: 


spaníar,...,a4) = (3X5; 1 Bj eR} . 
jul 
If (a1,..., a5] is independent and b € span{ar,...,@n}, then b is a unique 
linear combination of the a;. 

If 5$1,..., 5% are subspaces of IR”, then their sum is the subspace defined 
by S = { a +a2 +-+- +a, : 0; € Sj, i= Lk ). S is said to bea direct sum 
if each v € S has a unique representation v = a; +--+ + ay with a; € Sj. 
In this case we write S = S; @---@ 5,. The intersection of the 5; is also 
a subspace, S = S1 S4 N- N Sk. 

The subset {a;,,...,a:,} is a marimal linearly independent subset of 
{a1,..., Qn} ifit is linearly independent and is not properly contained in any 
linearly independent subset of {a:,...,@n}. If {a:,,...,0;,} is maximal, 
then span{a;,...,@,} = span{a;,,...,a;,} and ([a4....,a,) in a basis 
for span{ai,...,@n} . If S C IR" is a subspace, then it is possible to find 
independent basic vectors a,,...,ay € S such that S = span{a;,..., ax}. 
All bases for a subspace S have the same number of elements. This number 
is the dimension and is denoted by dim(S). 


2.1.2 Range, Null Space, and Rank 


There are two important subspaces associated with an m-by-n matrix A. 
The range of A is defined by 


ran(A) = (y € R” : y = Az for some z € IR"), 
and the null space of A is defined by 
null( A) = {z € IR^ : Az = 0). 
If A = [ai,..., a4] is a column partitioning, then 
ran( A) = span(ai,...,a4) . 
The rank of a matrix A is defined by 
rank( A) = dim (ran(A)). 


It can be shown that rank( A) = rank(AT). We say that A € IR'™*" is rank 
deficient if rank(A) < min{m,n}. If A € R™™", then 


dim(null( 4)) + rank(A) = n. 
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2.1.3 Matrix Inverse 

The n-by-n identity matriz In is defined by the column partitioning 
In = [eren] 

where ey is the kth “canonical” vector: 


eg 2 (0,...,0,1, 0,...,0)7. 
—— —— 
k-1 n-k 


The canonical vectors arise frequently in matrix analysis and if their di- 
mension is ever ambiguous, we use superscripts, i.e., ei”) eR’. 

If A and X are in R”™" and satisfy AX = J, then X is the inverse of 
A and is denoted by A7!. If A^! exists, then A is said to be nonsingular. 
Otherwise, we say A is singular. 

Several matrix inverse properties have an important role to play in ma- 
trix computations. The inverse of a product is the reverse product of the 
inverses: 


(AB)! = BOAT, (2.1.1) 
The transpose of the inverse is the inverse of the transpose: 
(A79? = (AT) = AT. (2.1.2) 
The identity 
BO = At B-'B- AA (2.1.3) 


shows how the inverse changes if the matrix changes. 
The Sherman-Morrison- Woodbury formula gives a convenient expres- 
sion for the inverse of (A--UV7) where A € IR"™" and U and V are n-by-k: 


(A-UVT)-! = ATAU + VTA U)-'VT A“, — (2.114) 


A rank & correction to a matrix results in a rank k correction of the inverse. 
In (2.1.4) we assume that both A and (J + V" A-!U) are nonsingular. 

Any of these facts can be verified by just showing that the “proposed” 
inverse does the job. For example, here is how to confirm (2.1.3): 


B(A'-B^(B-A)4) = BA! -(B- AM7! = I. 


2.1.4 The Determinant 


If A = (a) € R'*!, then its determinant is given by det(A) = a. The 
determinant of A € IR?*" is defined in terms of order n — 1 determinants: 


det(A) = 9 (71) ade Aij) 
j=l 
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Here, Ajj is an (n — 1)-by-(n — 1) matrix obtained by deleting the first row 
and jth column of A. Useful properties of the determinant includs 


det(AB) = det(A)dt(B) A,B e R” 
det(AT) = det(A) Aem 
det(cA) = det(A) ce R, A e R” 
det(A) #0 «» Ais nonsingular Ae R'*^ 


2.1.5 Differentiation 

Suppose a is a scalar and that A(a) is an m-by-n matrix with entries a; (a). 
If a; (a) is a differentiable function of a for all i and j, then by Á(o) we 
mean the matrix 


Alo) = FA = (Zasla)) = (ila). 


The differentiation of a parameterized matrix turns out to be a handy way 
to examine the sensitivity of various matrix problems. 


Problems 

P2.1.1 Show that if Ac R™*" has rank p, then there exists an X € R™*? and a 
Y € EX? such that A = XYT, where rank(X) = rank(Y) = p. 

P2.1.2 Suppose A(a) € R™*" and B(a) € F? *” are matrices whose entries are differ- 
entiable functions of the scalar a. Show 


as ABl] = [Ate] Bt) 4t [Ba] . 


P2.1.3 Suppose A(a) € E'*" has entries that are differentiable functions of the scalar 
a. Amuming A(a) is "hd nonsingular, show 
as [497] = - A7 [A] 47. 


P2.1.4 Suppose Ac ROX", b € R” and that ¢(z) = 427 Ax — zTb. Show that the 
gradient of ¢ ia given by Vee) = 1(AT + Ale — b. 

P2,1,5 Assume that both A and A--uvT are nonsingular where A € RO" and u,v € Kc 
Show that if z solves (A + uv )z = b, then it also solves a perturbed right hand side 
problem of the form Az = b + au. Give an expression for a in terms of A, ti, and v. 


Notes and References for Sec. 2.1 


There are many introductory lineer algebra texts. Among them, the following are par- 
ticularly useful: 


P.R. Halmos (1958). Finite Dimensional Vector Spaces, 2nd ed., Van Nostrand-Reinhold, 
Princeton. 
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S.J. Leon (1980). Linear Algebra with Applications. Macmillan, New York. 
G. Sirang (1993). Introduction to Linear Algebra, Wellesley-Cambridge Preas, Wellesley 


D. bn (1994). Linear Algebra and its Applications, Addison-Wesley, Reading, MA. 
C. Meyer (1997). A Course in Applied Linear Algebra, SIAM Publications, Philadelphia, 
PA. 


More advanced treatments include Gantmacher (1959), Horn and Johnson (1985, 1991), 
and 


A.S. Householder (1964). The Theory of Matrices in Numerical Analysis, Ginn (Blais- 
dell), Boston. 

M. Marcus and H. Mine (1984). A Survey of Matriz Theory and Matriz inequalities, 
Allyn and Bacon, Boston. 

J.N. Franklin (1963). Matriz Theory Prentice Hall, Englewood Cliffs, NJ. 

R. Bellman (1970). introduction to Matriz Analysis, Second Edition, McGraw-Hill, New 
York. 

P. Lancaster and M. Tismenetsky (1985). The Theory of Matrices, Second Edition, 
Academic Press, New York. 

J.M. Ortega (1987). Matriz Theory: A Second Course, Plenum Press, New York. 


2.2 Vector Norms 


Norms serve the same purpose on vector spaces that absolute value does 
on the real line: they furnish a measure of distance. More precisely, IR" 
together with a norm on IR" dsfines a metric space. Therefore, we have the 
familiar notions of neighborhood, open sets, convergence, and continnity 
when working with vectors and vector-valued functions. 


2.2.1 Definitions 


A vector norm on R” is a function f:R” — R that satisfies the following 
properties: 


f(x) > 0 zeR, (f(x) = 0 iff r = 0) 
F(zt+y) < f(z)-f(y) zyeR" 
flar) = lal f(z) cece RreR 
We denote such a function with a double bar notation: mion: (=) = = || 2 ||. Sub- 
scripts on the double bar are used to distinguish between various norms. 


A useful class of vector norms are the p-norms defined by 
| zl, = (ntm, pl. (2.2.1) 
Of these the 1, 2, and oo norms are the most important: 


lzlh = lzil mal 
lzlh = (ler? +--+ len)? = (272)? 
Izle = Pa 


max 
Isi<n 
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A unit vector with respect to the norm || - || is a vector z that satisfies 
Jz] 71. 


2.2.2 Some Vector Norm Properties 
A classic result concerning p-norms is the Holder inequality: 


ET < lzhls, Deli (222) 
A very important special case of this is the Cauchy-Schwartz inequality: 
IzTy| < Hz llall y la- (2.2.3) 
All norms on IR” are equivalent , i.e., if || - ||, and || - lla are norms on 
R”, then there exist positive constants, cy and c such that 
ez la Sele Seal elle (2.2.4) 
for all z € IR”. For example, if z € IR^, then 
zl, € Izl, < valzie (2.2.5) 
Iti. € lzla S vri zlo (2.2.6) 
Iz], € Ith, € nizio (2.2.7) 


2.2.8 Absolute and Relative Error 


Suppose £ € IR" is an approximation to z € IR". For a given vector norm 
I + || we say that 

faba = |2- zl 
is the absolute error in 2. If r X 0, then 


4 l-z 
“a= Tz] 
prescribes the relative error in Ż. Relative error in the oo-norm can be 


translated into a statement about the numher of correct significant digits 
in £. In particular, if 


lê -zle qe» 
ll z læ 
then the largest component of 2 has approximately p correct significant 
digits, 


Example 2,2.1 [fz = (1.234 .05674)7 and 2 = (1.235 .05129)7, then || £ — z ||, /l z loo 
=% .0043 = 10-3, Note than £1 has about three significant digits that are correct while 
only one significant digit in £2 is correct. 
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2.2.4 Convergence 
We say that a sequence {z‘*)} of n-vectors converges to z if 


jim | 2-2] =0. 


Note that because of (2.2.4), convergence in the a-norm implies convergence 
in the -norm and vice versa. 


Problems 


P2.2.1 Show thet if z € ^, than limp .o z ll, = ll Z ls. 

P2.2.2 Prove the Cauchy-Schwartz inequality (2.2.3) by considering the inequality 
0 < (ax + by)T (az + by) for suitable scalars a and b . 

P2.2.3 Verify that || - il, (| * ila, and || - |, are vector norms. 

P2.2.4 Verify (2.2.5)-(2.2.7). When is eqnality achieved in each result? 

P2.2.5 Show that in R”, z(? — z if and only if z( — 2, for k = Ln. 

P2.2,8 Show that any vector norm on R" is uniformly continnous by verifying the 
inequality | iz {l -lvii < liz — vil. 

P2.2.7 Let ||. | he a vector norm on R™ and assume A € R™*" . Show that if 
rank(A) = n, then || z ||, = || Az || is a vector norm on R^. 

P2.2.8 Let z and y he in R” and define y: E, — RB by y(a) = i| z — ay il2, Show that 
w is minimized when a = z7y/yT y. 

P2.2.9 (a) Verify that || z ll, = (lz P +--+ izni?) is a vector norm on (77. (b) Show 
that if z € C" then izi, < c(l Re(z) lp + Il Im(z) lp). (c) Find a constant cn such 
that en (ij Re(z) lla + il Im(z) ila) € Il z lla for alt ze C". 

P2.2.10 Prove or disprove; 


1 " 
VER => heile loo < 50 


lv fia. 


Notes and References for Sec. 2.2 


Although vector norm is “just” a generalization of the absolute value concept, there 
are some notewortby subtleties: 


J.D. Pryce (1984) “A New Measure of Relative Error for Vectors,” SIAM J. Num. 
Anal 21, 202-21. 


2.3 Matrix Norms 


The analysis of matrix algorithms frequently requires use of matrix norms. 
For example, the quality of a linear system solver may be poor if the ma- 
trix of coefficients ls “nearly singular.” To quantify the notion of near- 
singularity we need a measure of distance on the space of matrices. Matrix 
norms provide that measure. 


2.3. MATRIX NORMS 55 


2.3.1 Definitions 


Since R”*” is isomorphic to IR", the definition of a matrix norm should be 
equivalent to the definition of a vector norm. In particular, f:R'*^ — R 
is a matrix norm if the following three properties hold: 


f(A) 20 AER™", (f(A) = 0 iff A = 0) 
F(A +B) < f(A) - (B) ABER™, 
F(a@A) = lal f(A) a E R, Ae R?*^, 


As with vector norms, we use a double bar notation with subscripts to 
designate matrix norms, i.e., || A || = f(A). 
The most frequentiy used matrix norms in numerical linear algebra are 


the Frobenius norm, 
| Allg = [S23 logl? (2.3.1) 
iml jut 


| Az |l, 
Vzii, 


Note that the matrix p-norms are defined in terms of the vector p- norms 
that we discussed in the previous section. The verification that (2.3.1) and 
(2.3.2) are matrix norms is left ag an exercise. It is clear that || A ||, is 
the p-norm of the largest vector obtained by applying A to a unit p-horm 


vector: 
r 
a(l- 


It is important to understand that (2.3.1) and (2.3.2) define families 
of norms—the 2-norm on KÓ*? is a different function from the 2-norm on 
FÓ*5. Thus, the easily verified inequality 


and the p-norms 


All, = sup (2.3.2) 


All, = sup max || Az, 
zx 


ar 


|4BI,IAlIBI, AER", Be RY (2.3.3) 


is really an observation about the relationship between three different norms. 
Formally, we say that norms fi, fa, and fs on R?**, R^*^, and R'™*? are 
mutually consistent if for all A € R”™*” and B € IR'"* we have fi(AB) < 
fal A) (B. 

Not all matrix norms satisfy the submultipllcative property 


|| 4B| < I ANBI. (2.3.4) 
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For example, if || All, = max |a;;| and 


11 
4-2-|i iJ: 


then || AB ||, > || Allal B ||4- For the most part we work with norms that 
satisfy (2.3.4). 

The p-norms have the important property that for every A € IR"*” and 
z€ R" we have|Az|, < || All lz lp. More generally, for any vector 


norm || - ||, on R” and || - lg on Rh" we have | Azle € I Alla all x lla 
where || A ||, is a matrix norm defined by 
H | Az ils 
Allis = su TT” (2.3.5) 


We say that || - ||, 4 is subordinate to the vector norms || - ||, and || - lg 
Since the set {z € fe : [fla = 1) is compact and || - ||, is continuous, it 
foliows that 

lA lag = Nl | Az lg = IAr" lly (2.3.6) 


for some z* € IR" having unit a-norm. 


2.3.2 Some Matrix Norm Properties 


The Frobenius and p-norms (especially p = 1, 2, co) satisfy certain inequal- 
ities that are frequently used in the analysis of matrix computations. For 
A€ R™” we have 


Ala € ll Alle € val Alla (2.3.7) 
max lasl € l| Ale S Vmn max lel (2.3.8) 
4, ij 

lAl, = max. 2 lass! (2.3.9) 
l Allo = max È las! (2.3.10) 
Fal Alls < lAl < Vi Alo (2.3.11) 


A 


Jil, s Ala < vil Ah (2.3.12) 
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AER", 1 <i €i <m, and 1< jı € ja € n, then 


|| Asia, jija) lly <A ML, (2.3.13) 
The proofs of these relations are not hard and are left as exercises. 
A sequence {A} € R”*” converges if limpo || A9 -Al = 0. 


Choice of norm is irrelevant since all norms on R™*” are equivalent. 


2.3.3 The Matrix 2-Norm 


A nice feature of the matrix 1-norm and the matrix co-norm is that they 
are easily computed from (2.3.9) and (2.3.10). A characterization of the 
2-norm is considerably more complicated. 


Theorem 2.3.1 If A € R™*", then there exists a unit 2-norm n-vector z 
such that AT Az = p?z where p = || A lla. 


Proof. Suppose z € IR” is a unit vector such that || Az || = || A j|. Since 
z maximizes the function 


.llAzli _ lst ATAz 
=i 73 ate 


it foliows that it satisfies Vg(z) = 0 where Vg is the gradient of g. But a 
tedious differentiation shows that for i = l:n 


n 
Eo = |(272) DA Ae - ean. / (27 2)?. 

= 
In vector notation this says AT Az = (zT AT Az)z. The theorem follows by 
setting 4 = [| Az |l. O 


The theorem implies that || A ||} is a zero of the polynomial p(4) = 
det( AT A — AI). In particular, the 2-norm of A is the equare root of the 
largest eigenvalue of AT A. We have much more to say about eigenvalues in 
Chapters 7 and 8. For now, we merely observe that 2-norm computation 
is iterative and decidedly more complicated than the computation of the 
matrix 1-norm or co-norm. Fortunately, if the object is to obtain an order- 
of-magnitude estimate of || A ||», then (2.3.7), (2.3.11), or (2.3.12) can be 
used. 
As another example of "norm analysis,” here is a handy result for 2- 
norm estimation. 


Corollary 2.3.2 if A € R?*", then || Alla S Vll Allill All, > 


Proof. If z X 0 is such that AT Az = p?z with p = || A |a, then j?|| z ||, = 
WAT AZ|, s FAT aM AM zl, = ASIE A MU z lu. 2 
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2.3.4 Perturbations and the Inverse 


We frequentiy use norms to quantify the effect of perturbations or to prove 
that a sequence of matrices converges to a specified limit. As an illustration 
of these norm applications, let us quantify the change in A7! as a function 
of change in A. 

Lemma 2.3.3 If F € '*" and | F ||, <1, then I — F is nonsingular 
and 


U-F)y' = 


Me 
ks 


with 
I-F)"|, < -———. 
II-P 5 TA 
Proof. Suppose J — F is singular. It follows that (J — F)z = 0 for some 
nonzero r. But then || z ll, = || Fz ||, implies || F ll, 2 1, a contradiction. 
Thus, J — F is nonsingular. To obtain an expression for its inverse consider 


the identity 
N 
(5 s) U-F)-I-rN, 
km 


Since || F ||, < 1 it follows that dim Ft = 0 because | F*||, < || F lf. 


Thus, 
N 
( lim xr) (I-F) = 
Noo kao 


N 
It follows that (7 — F)"'= lim Y F*. From this it is easy to show that 
N-o0 (5 


lg - FY ly s Yung - cor? 
P 


Note that | (7 - F) - 1l, € | FIL/( — IF Ilp) as a consequence 


of the lemma. Thus, if ¢ <1, then O(c) perturbations in I induce O(c) 
perturbations in the inverse. We next extend this result to general matrices. 


Theorem 2.3.4 If A is nonsingular and r = || ACIE ||, < 1, hen AYE 

is nonsingular and | (A+ £)~'- A^! l SIENI A7! Ria -r). 

Proof. Since A is nonsingular A+ E = A(I — F) where F = -A^!E. 

Since || F ||, = r < 1 it follows from Lemma 2.3.3 that J—F is nonsingular 

and | (J - F)7! fj, < 1/(1 - r). Now (A+ E)7! = (I — F)71 A7! and so 
HA y 


(Ate, s e. 
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Equation (2.1.3) says that (A + E)! — A7! = - A7! E(A + EY’ and so 
by taking norms we find 
|H(A-Ey!-A^dM, € VATU, El, (A+ E)7! ll 


| A7! aN El, 


l-r 


lA 


Problems 


P2.3.1 Show || AB ||, SHAM, Bll, where 1 Sp < co. 

P2.3.2 Let B he any submatrix of A. Show tbat || Bl], € l| Allp 

P2.3.3 Show tbat if D = ding(1,...,4%) € RTX” with k = min{m, n), then || D lj, 
= max lul. 

P2.3.4 Verify (2.3.7) and (2.3.8). 

P2.3.5 Verify (2.3.9) and (2.3.10). 

P2.3.6 Verify (2.3.11) and (2.3.12). 

P2.3.7 Verify (2.3.13). 

P2.3.8 Show tbat if 0 # s € R^ and E c R°™™, then 


T 


F 


I Es lia 


= IEI - —— 
IENE Ts 


P2.3.9 Suppose u € R™ and v € R”. Show that if E = uv then | Elle = || Ella = 
Mss llall v lla and tòst [E f, S Iie lloll e lh. 


P2.3.10 Suppose A € FE'**^, y € R™, and 0% s € R^. Show that E = (y— As)s7 /s7s 
has the smallest 2-norm of all m-by-n matrices E that satisfy (A + E)s = y. 


Notes and References for Sec. 2.3 
For deeper issues conceming matrix/vector norms, see 


‘F.L. Bauer and C.T. Fike (1980). “Norms and Exclusion Theorems,” Numer. Math. £, 
137-44. 

L. Mirsky (1980). “Symmetric Gauge Functions and Unitarily Invariant Norma," Quart. 
J. Math. 11, 50-59. 

A.S. Householder (1984). The Theory of Matrices in Numerical Analysis , Dover Pub- 
lications, New York. 

N.J. Higham (1992). "Estimating the Matrix p-Norm," Numer, Math. 62, 539-556. 


2.4 Finite Precision Matrix Computations 


In part, rounding errors are what makes the matrix computation area so 
nontrivial and interesting. In this section we set up a model of floating point 
arithmetic and then use it to develop error bounds for floating point dot 
products, saxpy's, matrix-vector products and matrix-matrix products. For 
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a more comprehensive treatment than what we offer, see Higham (1996) or 
Wilkinson (1965). The coverage in Forsythe and Moler (1967) and Stewart 
(1973) is also excellent. 


2.4.1 The Floating Point Numbers 


When calculations are performed on a computer, each arithmetic opera- 
tion is generally affected by roundoff error. This error arises because the 
machine hardware can only represent a subset of the real numbers. We 
denote this subset by F and refer to its elements as floating point numbers. 
Following conventions set forth in Forsythe, Malcolm, and Moler (1977, pp. 
10-29), the floating point number system on a particular computer is char- 
acterized by four integers: the base 0, the precision t, and the exponent 
range [L, U]. In particular, F consists of all numbers f of the form 


fotdidg...d:xf Of <8, d #0, Lees 


together with zero. Notice that for a nonzero f € F we have m < |f| < M 
where 

m=! and M - B'(1— pr), (2.4.1) 
As an example, if 2 = 2, t = 3, L = 0, and U = 2, then the non-negative 


elements of F are represented by hash marks on the axis displayed in Fic. 
2.4.1. Notice that the floating point numbers are not equally spaced. A 


FIGURE 2.4.1 Sample Floating Point Number System 


typical value for (9, t, L,U) might be (2, 86, -64, 64). 


2.4.2 A Model of Floating Point Arithmetic 
To make general pronouncements about the effect of rounding errors on a 


given algorithm, it is necessary to have a model of computer arithmetic on 
F. To this end define the set G by 


Gz(zeR:mz|z| <M }u {0} (2.4.2) 
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and the operator fi: G — F by 


fie) = nearest c € F to x with ties handied 
— by rounding away from zero. 


The fi operator can be shown to satisfy 
f(z) = z(1- e) lel su (2.4.3) 
where u is the untt roundoff defined by 


u = seth (244) 


Let a and b be any two floating point numbers and let “op” denote any 
of the four arithmetic operations +, —, x, +. Jfaopb € G, then in our 
model of fioating point arithmetic we assume that the computed version of 
(a op b) is given by fl(a op b). It follows that fi(a op b) = (a op 6)(1 + €) 
with |e] € u. Thus, 


ere <u aopb $0 (2.4.8) 


showing that there is small relative error associated with individual arith- 
metic operations'. It is important to realize, however, that this is not 
necessarily the case when a sequence of operations is involved. 


Example 2.4.1 If = 10, t = 3 floating point arithmetic is used, then ìt can he shown 
that fi(fi(107* + 1) — 1] = 0 implying a relative error of 1. On the other hand the 
exact answer is given by fi[fi(1074 4. fI(1 — 1)] = 1074. Floating point arithmetic is 
not always associative. 


If a op b ¢ G, then an arithmetic exception occurs. Overflow and 
underflow results whenever |a op b| > M or 0 < |a op b| < m respectively. 
The handling of these and other exceptions is hardware/system dependent. 


2.4.3 Cancellation 


Another important aspect of finite precision arithmetic is the phenomenon 
of catastrophic cancellation. Roughly speaking, this term refers to the ex- 
treme loss of correct significant digits when small numbers are additively 
computed from large numbers. A well-known example taken from Forsythe, 
Malcolm and Moler (1977, pp. 14-16) is the computation of e7* via Tay- 
lor series with a 0. The roundoff error associated with this method is 

1 There are important examples of machines whose additive floating point operations 
satisfy fl(a +b) = (1 + e1)a + (1+ eg)b where |eil,|ea2| S v. In such an environment, 
the inequality [fi(a + b) — (a + b)| < ula + & need not bold. 
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approximately u times the largest partial sum. For large a, this error can 
actually be larger than the exact exponential and there will be no correct 
digits in the answer no matter how many terms in the series are summed. 
On the other hand, if enough terms in the Taylor series for e° are added and 
the result reciprocated, then an estimate of e7 to full precision is attained. 


2.4.4 "The Absolute Value Notation 


Before we proceed with the roundoff analysis of some basic matrix calcu- 
lations, we acquire some useful notation. Suppose A € IR™*” and that we 
wish to quantify the errors associated with its floating point representation. 
Denoting the stored version of A by fi(A), we see that 


Us = Faiz) = a(l e) lesu (2.4.6) 


for all i and j. A better way to say the same thing results if we adopt two 
conventions. If A and B are in R™*", then 


B= jAl > bj-lagi-lm,j-lm 


BSA => bj S$ ay, tslim, j=ln. 


With this notation we see that (2.4.6) has the form 
[f{A) - Al S ul Al. 


A relation such as this can be easily turned into a norm inequality, e.g., 
| £A) ~All, S ull Allı. However, when quantifying the rounding errors 


in a matrix manipulation, the absolute value notation can be a lot more 
informative because it provides a commant on each (i, j) entry. 


2.4.5 Roundoff in Dot Products 


We begin our study of finite precision matrix computations by considering 
the rounding errors that result in the standard dot product algorithm: 


s=0 
for k = lm 

87S IkVk (24.7) 
end 


Here, x and y are n-by-1 floating point vectors. 

In trying to quantify the rounding errors in this algorithm, we are 
immediately confronted with a notational problem: the distinction be- 
tween computed and exact quantities. When the underlying computations 
are clear, we shall use the fi(-) operator to signify computed quantities. 
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Thus, fi(z7y) denotes the computed output of (2.4.7). Let us bound 


[fl(zTy) —- zT y|. If > 
Sy = t (rn). 
kml 


then sı = zjgi(1 +61) with |ó} < u and for p = 2:n 


3p = filsp-1 + fl(zpyp)) 
= (Sp-1 + Zpyp(1 +4p)) (1 + ep) Spl lel S m. (2.4.8) 
A little algebra shows that 


fi(zT y) = Sn = osea +z) 
k=) 


where a 
Q0) = (1 +4) [[0 +63) 
j-k 
with the convention that e; = 0. Thus, 
"n 
IF(Ty) -Tu s. 3 ead (2.4.9) 


k=l 
To proceed further, we must bound the quantities || in terms of u. The 
following result is useful for this purpose. 


"n 

Lemma 2.4.1 [f(1+a) = Ila +Q%) where |ar| < u and nu € .01, then 
kml 

jal S 1.01nu. 


Proof. See Higham (1996, p. 75). O 


Applying this result to (2.4.9) under the “reasonable” assumption nu < .01 


gives 
Hz y) —zTy| < lO0lnu|z|? |y]. (2.4.10) 


Notice that if [z7 y| < |z(T|y|, then the relative error in fi(zTy) may not 
be small. 
2.4.6 Alternative Ways to Quantify Roundoff Error 


An easier but less rigorous way of bounding a in Lemma 2.4.1 is to say 
la| € nu + O(u?). With this convention we have 


[flzTy) — 27 yl < nuai" ly] + O(u*). (2411) 
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Other ways of expressing the same result include 
VT y) — z7y| < é(n)ulzi"]y| (2.4.12) 


and 
If" y) — zT y] < enulz|T yh, (2.4.13) 


where in (2.4.12) (n) is a “modest” function of n and in (2.4.13) c is a 
constant of order unity. 

We shall not express a preference for any of the error bounding styles 
shown in (2.4.10)-(2.4.13). This spares us the necessity of translating the 
roundoff results that appear in the literature into a fixed format. Moreover, 
paying overly close attention to the details of an error bound is inconsistent 
with the “philosophy” of roundoff analysis. As Wilkinson (1971, p. 567) 
Says, 


"There is still a tendency to attach too much importance to the 
precise error bounds obtained by an à priori error analysis. In 
my opinion, the bound itself is usually the least important part 
of it. The main object of such an analysis is to expose the 
potential instabilities, if any, of an algorithm so that hopefully 
from the insight thus obtained one might be led to improved al- 
gorithms, Usually the bound itself is weaker than it might have 
been because of the necessity of restricting the mass of detail 
to a reasonable level and because of the limitations imposed by 
expressing the errors in terms of matrix norms. À priori bounds 
are not, in general, quantities that should be used in practice. 
Practical error bounds should usually be determined by some 
form of à posteriori error analysis, since this takes full advan- 
tage of the statistical distribution of rounding errors and of any 
special features, such as sparseness, in the matrix. 


It is important to keep these perspectives in mind. 


2.4.7 Dot Product Accumulation 


Some computers have provision for accumulating dot products in double 
precision. This means that if z and y are floating point vectors with length 
t mantissas, then the running sum s in (2.4.7) is built up in a register with 
a 2t digit mantissa. Since the multiplication of two t-digit floating point 
numbers can be stored exactly in a double precision variable, it is only 
when s is written to single precision memory that any roundoff occurs. In 
this situation one can usually assert that à computed dot product has good 
relative error, ie., fi(zT y) = zT y(1-- 6) where |ó| ~u. Thus, the ability 
to accumulate dot products is very appealing. 
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2.4.8 Roundoff in Other Basic Matrix Computations 
It is easy to show that if A and B are floating point matrices and a is a 
floating point number, then 
fiaA)=aA+E [El € ulaA| (2.4.14) 
and 
FUA+B)=(A+B)+E  |E|€ ulA* BJ. (2.4.15) 
As a consequence of these two results, it is easy to verify that computed 
saxpy's and outer product updates satisfy 
f(az-y)-oz*ycz | <u(2jaz| + iyi) +O(u?) (2.4.16) 


JUC +uvT) =C +u +E — |E| X u (JC] + 2|uT]) -O(u?). (2.417) 


Using (2.4.10) it is easy to show that a dot product based multiplication of 
two floating point matrices A and B satisfies 


f(AB)-AB4E — |E| £ nul4||B| + O(u?). (2.4.18) 


The same result applies if a gaxpy or outer product based procedure is used. 
Notice that matrix multipllcation does not necessarily give small relative 
error since |AB| may he much smaller than |A|[B|, eg., 


11 10] [.0 0 
0 0 -99 0] 0 at’ 
It is easy to obtain norm bounds from the roundoff results developed thus 


far. If we look at the 1-norm error in floating point matrix multiplication, 
then it is easy to show from (2.4.18) that 


| (AB) - ABl, S nul All] B [f +O(u?). (2.4.19) 


2.4.9 Forward and Backward Error Analyses 


Each roundoff bound given above is the consequence of a forward error 
analysis, An alternative style of characterizing the roundoff errors in an 
algorithm is accomplished through a technique known as backward error 
analysis, Here, the rounding errors are related to the data of the problem 
rather than to its solution. By way of illustration, consider the n = 2 
version of triangular matrix multiplication. It can be shown that: 


l anbu(l +e) (aub + €2) + arzba2(1 + €3))(1 + e4) 
f(AB) = 
0 423b22(1 + €s) 
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where |e,;| < u, for i = 1:5. However, if we define 


. an a(l  eg)(1 + e) | 
A= 
| 0 G23(1 + es) 


and 


` | bir(1 +61) bia(1 + €2)(1 + €4) | 


0 baz 
then it is easily verified that fi(AB) = AB. Moreover, 
A=A+E  |E| £2uA| - O(v?) 
B=B+F  |F| £2u|B| - O(u?). 


In other words, the computed product is the exact product of slightly per- 
turbed A and B. 


2.4.10 Error in Strassen Multiplication 


In 81.3.8 we outlined an unconventional matrix multiplication procedure 
due to Strassen (1969). It is instructive to compare the effect of roundoff 
in this method with the effect of roundoff in any of the conventional matrix 
multiplication methods of §1.1. 

a can be shown that the Strassen approach (Algorithm 1.3.1) produces 

= fi(AB) that satisfies an inequality of the form (2.4.19). This is 

dit satisfactory in many applications. However, the C that Strassen’s 
method produces does not always satisfy an inequality of the form (2.4.18). 
To see this, suppose 


A=B=| 99 ^» | 


0010  .99 


and that we execute Algorithm 1.3.1 using 2-digit floating point arithmetic. 
Among other things, the following quantities are computed: 


Pg = fi(.99(.001 — .99)) = —.98 
By = fl((.99 + .001).99) = 
ĉa = fl, +Å) =0.0 


Now in exact arithmetic c12 = 2(.001)(.99) = .00198 and thus Algorithm 1.3.1 
produces a &j; with no correct significant digits. The Strassen approach gets 
into trouble in this example because small off-diagonal entries are combined 
with large diagonal entries. Note that in conventional matrix multiplication 
neither 5j; and 535 nor a1; and aiz are summed. Thus the contribution of 
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the small off-diagonal elements is not lost. Indeed, for the above A and B 
a conventional matrix multiply gives 63 = .0020. 

Failure to produce a componentwise accurate C can be a serious short- 
coming in some applications. For example, in Markov processes the aij, 
bij, and c; are transition probabilities and are therefore nonnegative. It 
may be critical to compute c;; accurately if it reflects a particularly im- 
portant probability in the modeled phenomena. Note that if A > 0 and 
Bz0, then conventional matrix multiplication produces a product C that 
bas small componentwise relative error: 


IÉ - C| < null [Bj +O(u2) = najel + O(u?). 


This follows from (2.4.18). Because we cannot say the same for the Strassen 
approach, we conclude that Algorithm 1.3.1 is not attractive for certain 
nonnegative matrix multiplication problems if relatively accurate ó;; are 
required. 

Extrapolating from this discussion we reach two fairly obvious but im- 
portant conclusions: 


* Different methods for computing the same quantity can produce sub- 
stantially different results. 


* Whether or not an algorithm produces satisfactory results depends 
upon the type of problem solved and the goals of the user. 


These observations are clarified in subsequent chapters and are intimately 
related to the concepts of algorithm stability and problem condition. 


Problems 


P2.4-1 Show that if (24.7) is applied with y = z, then fl(zTz) = zTz(1-- o) where 
Jal S nu + O(u?). 

P2.4.2 Prove (2.4.3). 

P2.4.3 Show that if E c RX” with m > n, then || |E] fa < Val Ella. This result is 
useful when deriving norm bounds from absolute value bounds. 

P2.4.4 Assume the existence of a square root function satisfying fl(/z) = /2(1 + e) 
with |e| < u. Give an algorithm for computing || x [|a and bound the rounding errors. 
P2.4.5 Suppose A and B are n-by-n upper triangular flosting point matrices. If Ó = 
ft( AB) is computed using one of the conventional $1.1 algorithms, does it follow that 
Ó = AB where A and B are ciose to A and B? 

P2.4.6 Snppose A and B are n-by-n floating point matrices and that A is nonsingular 
with j| [A71]|A] los = r. Show that if C = fi(AB) is obtained using any of the 
algorithms in 81.1, then there exists a Ê so C = AB and | B- B {loo < nurlj B foo + 
O(u?). 

P2.4.T Prove (24.18). 


68 CHAPTER 2. MATRIX ANALYSIS 


Notes and References for Sec. 2.4 
For a general introduction to the effects of roundoff error, we recommend 


J.H. Wilkinson (1963). Rounding Errors in Algebraic Processes, Prentice-Hall, Engle- 
wood Cliffs, NJ. 

J.H. Wilkinson (1971). “Modern Error Analysis," SIAM Review 13, 548-68. 

D. Kahaner, C.B. Moler, and S. Nash (1988). Numerical Methods and Software, Prentice- 
Hall, Englewood Cliffs, NJ. 

F. Chaitin-Chatelin and V. Frayseé (1996). Lectures on Finite Precision Compulations, 
SIAM Publications, Philadelphia. 


More recent developments in error analysis invoive interval analysis, the building of sta- 
tistical models of roundoff error, and the automating of the analysis itself: 


T.E. Hull and J.R. Swensen (1966). “Tests of Probabilistic Models for Propagation of 
Roundoff Errora,” Comm. ACM. 9, 108-13. 

J. Larson and A. Sameh (1978). "Efficient Calculation of the Effects of Roundoff Errors,” 
ACM Trans. Math. Soft. 4, 228-36. 

W. Miller and D. Spooner (1978). “Software for Roundoff Analysis, IL" ACM Trans. 
Math. Soft, 4, 369-90. 

J.M. Yohe (1979). “Software for Interval Arithmetic: A Reasonable Portable Package,” 
ACM Trans. Math. Soft. 5, 50-63. 


Anyone engaged in serious software development needs a thorough understanding of 
floating point arithmetic. A good way to begin acquiring knowledge in this direction is 
to read about the IEEE floating point standard in 


D. Goldberg (19901). “What Every Computer Scientist Should Know About Floating 
Point Arithmetic,” ACM Surveys 23, 5-48. 


See also 


R.P. Brent (1978). “A Fortran Multiple Precision Arithmetic Package,” ACM Trana, 
Math. Soft. 4, 57-70. 

R.P. Brent (1978). “Algorithm 524 MP, a Fortran Multiple Precision Arithmetic Pack- 
age" ACM Trans. Math. Soft. 4, 71-81. 

J.W. Demmel (1984). "Uuderflow and the Reliability of Numerical Software,” SIAM J. 
Sci and Stat. Comp, 5, 887—919. 

U.W. Kulisch and W.L. Miranker (1966). "The Arithmetic of the Digital Computer," 
SIAM Rewew 28, 1-40. 

W.J. Cody (1988). "ALGORITHM 665 MACHAR: A Subroutine to Dynamicaily De- 
termine Machine Parameters,” ACM Trans. Math. Soft. 14, 303-311. 

D.H. Bailey, H.D. Simon, J. T. Berton, M.J. Fouta (1989). “Floating Point Arithmetic 
in Future Supercomputers,” Intl J. Supercomputing Appl. 3, 86-90. 

D.H. Bailey (1993). "Algorithm 719: Multiprecision Translation and Execution of FOR- 
TRAN Programa," ACM Trans. Math. Soft. 19, 288-319. 


The subtleties associated with the development of high-quality software, even for “sim- 
ple" problema, are immense. A good example is the design of a subroutines to compute 
2-norma 


J.M. Blue (1978). *A Portable FORTRAN Program to Find the Euclidean Norm of & 
Vector,” ACM Trans. Math. Soft. 4, 15-23. 


For an analyzis of the Strassen algorithm and other “fast” linear algebra procedures see 
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R.P, Brent (1970). “Error Analysis of Algorithms for Matrix Multiplication and Trian- 
gular Decomposition Using Winograd's Identity,” Numer. Math. 16, 145-156, 

W. Miller (1975). "Computational Complexity and Numerical Stability" SIAM J. Com- 
puting 4, 91-107. 

N.J, Higham (1992). "Stability of a Method for Multiplying Complex Matrices with 
Three Real Matrix Multiplicationa,” SIAM J, Matriz Anal. Appl. 13, 681-687. 

J.W. Demmel and N.J, Higham (1992). “Stability of Block Algorithms with Fast Level-3 
BLAS," ACM Trans. Math. Soft, 18, 274-291. 


2.5 Orthogonality and the SVD 


Orthogonality has a very prominent role to play in matrix computations. 
After establishing a few definitions we prove the extremely useful singular 
value decomposition (SVD). Among other things, the SVD enables us to 
intelligently handle the matrix rank problem. The concept of rank, though 
perfectly clear in the exact arithmetic context, is tricky in the presence of 
roundoff error and fuzzy data. With the SVD we can introduce the practical 
notion of numerical rank. 


2.5.1 Orthogonality 


A set of vectors (z1,...,z,) in IR" is orthogonal if zTz; = 0 whenever 
i # j and orthonormal if zJz; = 6,;. Intuitively, orthogonal vectors are 
maximally independent for they point in totally different directions. 

A collection of subspaces 5;,...,5p in EU" is mutually orthogonal if 
zTy = 0 whenever z € S; and y € S; for i # j. The orthogonal complement 
of a subepace S C IR" is defined by 


SŁ = {y E R” :yTz =0 for all z € S) 


and it is not hard to show that ran(A)* = nuli( AT). The vectors vi, .. . ty 
form an orthonormal basis for a subspace S C IR™ if they are orthonormal 
and span S. 

A matrix Q € R™*™ is said to be orthogonal if QTQ = I. If Q = 
[41,..., am ] is orthogonal, then the q; form an orthonormal basis for R™. 
It is always possible to extend such a basis to a full orthonormal basis 
{t1,...,Um} for IR: 


Theorem 2.5.1 If V, c R'*" has orthonormal columns, then there exists 
Va E Re" (^7 such that 
V z[MV 


is orthogonal. Note that ran(Vi)* = ran(Va). 


Proof. This is a standard result from introductory linear algebra. It is 
also a corollary of the QR factorization that we present in $5.2. O 


70 CHAPTER 2. MATRIX ANALYSIS 


2.5.2 Norms and Orthogonal Transformations 


The 2-norm is invariant under orthogonal transformation, for if QTQ = J, 
then || Qz|| = zTQTQzr = zTzr = |z|2. The matrix 2-norm and 
the Frobenius norm are also invariant with respect to orthogonal transfor- 
mations. In particular, it is easy to show that for all orthogonal Q and Z 
of appropriate dimensions we have 


lQAZ le = I Allg (2.5.1) 
and 


QAZ lla = 1 A lla. (2.5.2) 


2.5.3 The Singular Value Decomposition 


The theory of norms developed in the previous two sections can be used to 
prove the extremely useful singular value decomposition. 


Theorem 2.5.2 (Singular Value Decomposition (SVD)) [fA is areal 
m-by-n matriz, then there exist orthogonal matrices 


U = |ui... um] ER”*™ and V -([w,...,v,] eR" 
such that 
UTAV —disg(zi,...,0,) ER" p= min(m,n) 
where o, > 02 2... 205 2 0. 


Proof. Let z € IR" and y € R” be unit 2-norm vectors that satisfy Az = 
oy with c = [| Alla. From Theorem 2.5.1 there exist Vz € R"™®-") and 
Uz ER™*(™)) so V = [z V9] e E'*" and U = [y Ur] € R?*" are 

orthogonal. It is not hard to show that UT AV has the following structure: 


T 
utav = |G $]e^ 


lez DE = mem 


we have | Ai I. > (c? + wTw). But o? = || A|} = || A: |[2_, and so we 
must have w = 0. An obvious induction argument completes the proof of 
the theorem, O 


Since 


The g; are the singular values of A and the vectors u; and v; are the 
ith left singular vector and the ith right singular vector respectively. It 
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is easy to verify by comparing columns in the equations AV = UT and 
ATU = VET that 


Av; 
ATu; 


0,14 s ee) 
pn } i= Lmin(m,n) 


It is convenient to have the following notation for designating singular val- 


ues: 


eA) = the ith largest singular value of A, 
Omaz(A) = the largest singular value of A, 
Gmin(A) = the smallest singular value of A. 


The singular values of a matrix A are precisely the lengths of the semi-axes 
of the hyperellipsoid E defined by E = { Az: || z 2 =1 }- 


Example 2.5.1 
[9912]. r_f 6 -8][3 0)f 8 s] 
47 [ 2% x]-v^-[$ alla Es 4] 


The SVD reveals a great deal about the structure of a matrix. If the 
SVD of A is given by Theorem 2.5.2, and we define r by 


eio 2 Or > Ory, = = Op =O, 
then 
rank(A) = r (2.5.3) 
nul(A) = span{t,41,.--:tn} (2.5.4) 
: ran(A) = span([ui..., ur}, (2.5.5) 


and we have the SVD expansion 
r 
A= Lom. (2.5.6) 
i=l 
Various 2-norm and Frobenius norm properties have connections to the 
SVD. If A € R”*”, then 


[Alle = of+---+03  p=min{m,n} (257) 
[Ala = « (2.5.8) 


min (AP la 


zo || lla on (m2 n). (2.5.9) 
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2.5.4 The Thin SVD 
If A= UEVT e R?*" is the SVD of A and m > n, then 


A=U,E,V7 


where 
U, = U(, Lin) = [uj,..., un] e R 


and 
E, = Ulin, lin) = diag(oi,...,04) € C". 


We refer to this much-used, trimmed down version of the SVD as the thin 
SVD. 


2.5.5 Rank Deficiency and the SVD 


One of the most valuable aspects of the SVD is that it enables us to deal 
sensibly with the concept of matrix rank. Numerous theorems in linear 
algebra have the form "if such-and-such a matrix has full rank, then such- 
and-such a property holds." While neat and aesthetic, results of this flavor 
do not help us address the numerical difficulties frequently encountered in 
situations where near rank deficiency prevails. Rounding errors and fuzzy 
data make rank determination a nontrivial exercise. Indeed, for some small 
€ we may be interested in the e-rank of a matrix which we define by 


rank(A,e)= min rank(B). 
|A~Bllase 


Thus, if A is obtained in a laboratory with each a;; correct to within +.001, 
then it might make sense to look at rank(A, .001). Along the same lines, if 
A is an m-by-n floating point matrix then it is reasonable to regard A as 
numerically rank deficient if rank(A, c) < min{m,n} with e = ulj A |l2. 

Numerical rank deficiency and ¢-rank are nicely characterized in terms 
of the SVD because the singular values indicate how near a given matrix is 
to a matrix of lower rank. 


Theorem 2.5.3 Let the SVD of A € R”*™ be given by Theorem 2.5.2. If 
k <r =rank(A) and 


k 
Ak = Soo, (2.5.10) 
i=l 
then 
min A-B] = |] A- Ag lo = akp. (2.5.11) 


rank(S)=k 


2.5. ORTHOGONALITY AND THE SVD 73 


Proof, Since UT AV = diag(o1,...,0%,0,...,0) it follows that rank( A4) = 
kand that UT(A—Ax)V = diag(0,...,0,0%41,--+,0p) and so || A — Ag llo = 
k+l: 

Now suppose rank(B) = k for some B € ^*^. It follows that we can 
find orthonormal vectors T1,...,Tn-k so null(B) = span(zi,....za-&). 
A dimension argument shows that 


span(zi,...,za-&) N span{u,...,uk+1} # (0)- 


Let z be a unit 2-norm vector in this intersection. Since Bz = 0 and 


k+ 
Az = Yoill zju 
=i 


we have 


kl 
YA-BIR > I(A-B«I = 28 = Yooter2? > via 


i=l 
completing the proof of the theorem. O 

Theorem 2.5.3 says that the smallest singular value of A is the 2-norm 
distance of A to the set of all rank-deficient matrices. It also follows that 


the set of full rank matrices in IR"*" is both open and dense. 
Finally, if re = rank(A, €), then 


T Zoe È Ong > EÈ arny e p = min{m, n}. 


We have more to say about the numerical rank issue in §5.5 and §12.2. 


2.5.6 Unitary Matrices 


Over the complex feld the unitary matrices correspond to the orthogonal 
matrices, In particular, Q € C*™*" is unitary ifQ4Q = QQ4 = In. Unitary 
matrices preserve 2-norm. The SVD of a complex matrix involves unitary 
matrices. If A € €”*", then there exist unitary matrices U € (7**'" and 
V € ©" such that 


U" AV = disg(ci,...,7,) € R?^*^ — pemin(m,n) 
wheregi > 02 2... 2 05 2 0. 


Problems 


P2,5.1 Show that if S is real and ST = —S, then / — $ is nonsingular and the matrix 
U ~ S)- (I + S) is orthogonal. This is known as the Cayley tronsform of S. 
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P2.5.2 Show that a triangular orthogonal matrix is diagonal. 
P2.5.3 Show that if Q = Qi + iQ2 is unitary with Q1, Q € R®*", then the 2n-by-2n 
rea] matrix Q Q 

1 -Q 

Z= [ Q Qs ] 
is orthogonal. 
P2.5.4 Establish properties (2.5.3)-(2,5.9). 
P2.5.5 Prove that T 
v^ Ax 
Omaz(A) = max ILT——— 
me 7 perzen Vahilvla 

P2.5.8 For the 2-by-2 matrix A = [ 1 i |: derive expressions for ¢max(A) and 
Omin(A) that are functions of w, x, y, and z. 
P2.5.7 Show that any matrix in EC" *" is the limit of a sequence of full rank matrices. 


P2,5.8 Show that if A€ KC" has rank n, then || A(AT A)- 1 AT [|a = 1. 
P2,5.9 What is the nearest rank-one matrix to A = [ 1 M ] in the Frohenius norm? 


P2.5.10 Show that if A € R™*" then || Allp < y/rank(A) || A[[2, thereby sharpening 
(2.3.7). 


Notes and References for Sec, 2.5 


Forsythe and Moler (1967) offer a good account of the SVD's role in the analysis of the 
Az = b problem. Their proof of the decomposition is more traditional than ours in that 
it maken use of the eigenvalue theory for symmetric matrices. Historical SVD references 
include 


E. Beltrami (1873). "Sulle Funzioni Bilineari,” Gionale di Mathematiche 11, 98-106. 

C. Eckart and G. Young (1939). "A Principal Axis Transformation for Non- Hermitian 
Matrices,” Buil. Amer. Math. Soc. 45, 118-21. 

G.W. Stewart (1993). "On the Early History of the Singular Value Decomposition,” 
SIAM Review 35, 551-566. 


One of the most significant developments in scientific computation has been the increased 

use of the SVD in application areas that require the intelligent handling of matrix rank. 

The range of applications is impressive. One of the moat interesting is 

C.B. Moler and D. Morrison (1983), “Singular Value Analysis of Cryptograms,” Amer. 
Math. Monthly 90, 78-87. 

For generalizations of the SVD to infinite dimensional Hilbert space, see 

LC. Gohberg and M.G. Krein (1969), Introduction to the Theory of Linear Non-Self 
Adjoint Operators , Amer. Math. Soc., Providence, R.I, 

F. Smithies (1970). integral Equations, Cambridge University Press, Cambridge. 

Reducing the rank of à matrix as m Theorem 2.5.3 when the perturbing matrix is con- 

strained is discussed in 

J.W. Demmni (1987). "The smallest perturbation of a submatrix which lowers the rank 
and constrained total least squares problems, SIAM J, Numer. Anal 24, 199-206. 
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G.H. Golub, A. Hoffman, and G.W. Stewart (1988). “A Generalization of the Eckart- 
Young-Mirsky Approximation Theorem." Lin. Alg, and Its Applic. 88/89, 317-328. 

G.A. Watson (1968). “The Smallest Perturbation of a Submatrix which Lowers the Rank 
of the Matrix,” IMA J. Numer. Anal, 8, 295-304, 


2.6 Projections and the CS Decomposition 


If the object of a computation is to compute a matrix or a vector, then 
norms are useful for assessing the accuracy of the answer or for measuring 
progress during an iteration. If the object of a computation is to compute 
a suhepace, then to make similar comments we need to be able to quantify 
the distance between two suhepaces, Orthogonal projections are critical in 
this regard. After the elementary concepts are established we discuss the 
CS decomposition. This is an SVD-like decomposition that is handy when 
having to compare a pair of suhepaces. We begin with the notion of an 
orthogonal projection. 


2.6.1 Orthogonal Projections 


Let S C R” be a subspace. P € IR"*” is the orthogonal projection onto 
S if ran(P) = S, P? = P, and PT = P. From this definition it is easy to 
show that if z € IR", then Pr € S and (1 - P)ze S+. 

If P, and P; are each orthogonal projections, then for any z € IR" we 
have 

I (Pi ~ Pail} = (Piz) (I - Pa)z + (Paz) (I - Aide. 

If ran(P,) = ran(P») = S, then the right-hand side of this expression 1s 
zero showing that the orthogonal projection for a suhepace is unique, If the 
columns of V = [vi,... , Vvk ] are an orthonormal hasis for a suhepace S, then 
it is easy to show that P = VVT is the unique orthogonal projection onto 
S. Note that if v € IR^, then P = vv /uyTv Is the orthogonal projection 
onto S = span{v}. 


2.6.0 SVD-Related Projections 


There are several important orthogonal projections associated with the sin- 
gular value decomposition, Suppose A = UEVT c IR™" is the SVD of A 
and that r = rank(A). If we have the U and V partitionings 


U-[U. U | V=[% W] 
r m-r r o n-r 
then 
W.V7 = projection on to null(A)* = ran(AT) 
V.V = projection on to nuli(A) 
U.UT = projection on to ran(A) 


U.ÜT = projection on to ran(A)* = null(AT) 
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2.6.3 Distance Between Subspaces 


The one-to-one correspondence between subspaces and orthogonal projec- 
tions enables us to devise a notion of distance between subspaces. Suppose 
Sı and S2 are suhepaces of IR" and that dim(5,) = dim(S2). We define the 
distance between these two spaces by 


dist(51, S2) = || Pi - Poll (2.6.1) 


where F; is the orthogonal projection onto S;. The distance between a 
pair of suhepaces can be characterized in terms of the blocks of a certain 
orthogonal matrix. 


Theorem 2.6.1 Suppose 


W-[W, Ww] Z-[Z2 2] 
k n-k k n-k 


are n-by-n orthogonal matrices. If S = ran(W1) and S2 = ran(Zi), then 
dist(S1,52) = || WE Z2 l2 = ll 27 Wa llo. 
Proof. 


di(51,85) = WWF -ZZ |, = 1WT(AWT — AZDZ la 


|l vez 


Note that the matrices W7 Z, and Wf Zz are submatrices of the orthogonal 
matrix 


_~f Qu Qu] [WIZ T 
a-| 38 Qu | ~ [wre we, |e wW z. 
Our goal is to show that |! Qa: ll, = || Qi2 lz. Since Q is orthogonal it 
follows from 
[i] [8 
0 Qnr 


1="Quzl} + llQuz l2 
for all unit 2-norm z € R*, Thus, 


that 


lQni? = mex JQazlZ=1- mm Quel? 
laz amt lz llm 


1- 9min(Qu y. 
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Analogously, by working with QT (which is also orthogonal) it is possible 


to show that 2 
I Qf ila = 1 — amal). 
and therefore 
It Quail = 1 ~- ommin(Qa1)?. 
Thus, || Qa1 |l; = Il Qua Ila. D 
Note that if 5; and 52 are suhepaces in IR” with the same dimension, then 
0 < dist(S;,52) < 1. 


The distance is zero if S, = S2 and one if S3 {Sf # (0). 

A more refined analysis of the blocks of the Q matrix above sheds more 
light on the difference between a pair of suhepaces, This requires a special 
SVD-like decomposition for orthogonal matrices. 


2.6.4 The CS Decomposition 


The blocks of an orthogonal matrix partitioned into 2-by-2 form have highly 
related SVDs. This is the gist of the CS decomposition. We prove a very 
useful special case first. 


Theorem 2.6.2 (The CS Decomposition (Thin Version)) Consider the 
matrix 
Q2 


where m, 2 n and m > n. If the columns of Q are orthonormal, then there 
exist orthogonal matrices U, € IR?! Uz e R™*™2, and V, € IR" such 


a-| 3] Qi e Rn", Qr c go 


that u olo c 
[5 o ] [& ]v - [5] 
where 
C = disg(cos(61),. .., cos(85.)), 
S = diag(sin(6,),...,sin(8,)), 
and 


056, X6 S. SHS 


Proof, Since } Qui |; € IQ li; = 1, the singular values of Qi, are all in 
the interval [0,1]. Let 


UT QV; = C = disg(cn 6.) = E :] t 
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be the SVD of Q, where we assume 
Lec. = = GQ > ty 2 2m 20. 


To complete the proof of the theorem we must construct the orthogonal 
matrix Us. If 
QW - [Wi WA] 
ton-t C 


u ore. E 
0 | Qi . 
m Wi Wa 
Since the columns of this matrix have unit 2-norm, W; = 0. The columns 
of W are nonzero and mutually orthogonal because 
WIW = Int- ETE = diag(1— d,,,....1 -d) 
is nonsingular. If s, = y1- cl for k = 1:n, then the columns of 
Z = Ws diag(1/s¢41,.--,1/8n) 


are orthonormal. By Theorem 2.5.1 there exists an orthogonal matrix 
Us €R™*™ with Us(:, t + Ln) = Z. It is easy to verify that 


UTQM = disg(s1,...,3n) = S. 


Since c2 +s? = 1 for k = L:n, it follows that these quantities are the required 
cosines and sines. O 


then 


Using the same sort of techniques it is possible to prove the following more 
general version of the decomposition: 


Theorem 2.6.3 (CS Decomposition (General Version)) Jf 


_ f Qu [Qe 
Q= [ Qa 


is a 2-by-2 (arbitrary) partitioning of an n-by-n orthogonal matriz, then 
there exist orthogonal 


v= ST 


Such that 
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where C = diag(ci,...,c5) and S = diag(s1,...,s5) are square diagonal 
matrices with 0 < Ci, 8; < 1. 


Proof. See Paige and Saunders (1981) for details, We have suppressed the 
dimensions of the zero submatrices, some of which may be empty. O 

The essential message of the decomposition is that the SVDs of the Q,; are 
highly related. 


Example 2.6.1 The matrix 


-0.7576 0.3697 | 0.3838 — 0.2128 -03112 
-0.4077 -0.1552| -0.1129 — 0.2876 — 08517 
Q = | -0048 


UTQV = 


The angles assoclated with the cosines and sines turn out to be very im- 
portant in a number of applications. See §12.4. 


Problems 


P2.6.1 Show that if P is an orthogonal projection, then Q = I — 2P is orthogonal. 
P2.8.2 What are the singular values of an orthogonal projection? 


P2.4.3 Suppose 5; = span{r} and S2 = span(y), where z and y are unit 2-norm 
vectors in R?. Working only with the definition of dist(-,-), show that dist(51, S2) = 
V1 - (zT y)? verifying that the distance between Sı and S2 equals the sine of the angle 
between z and y. 


Notes and References for Sec. 2.4 
The following papers discuas various aspects of the CS decomposition: 


C. Davis and W. Kahan (1970). "The Rotation of Figenvectors by » Perturbation 111,” 
SIAM J. Num. Anal. 7, 1-46. 

G.W. Stewart (1977). “On the Perturbation of Pseudo-Inverses, Projections and Linear 
Least Squares Problems,” SIAM Review 19, 634-662. 

C.C. Paige and M. Saunders (1981). “Toward a Generalized Singular Value Decomposi- 
tion,” SIAM J. Num. Anai. 18, 398-405. 

C.C. Paige and M. Wei (1994). "History and Generality of the CS Decomposition,” Lin. 
Alg. and [ts Applic. 208/209, 303-328. 
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See $8.7 for some computational details. 
For a deeper geometrical understanding of the CS decomposition and the notion of 
distance between subspaces, see 


T.A. Ariss, A. Edelman, and S. Smith (1996). “Conjugate Gradient and Newton's 
Method on the Graasman and Stiefel Manifolds,” to appear in SIAM J. Matriz Anal. 
Appi. 


2.7 The Sensitivity of Square Systems 


We now use some of the tools developed in previous sections to analyze the 
linear system problem Az = b where A € R"*" is nonsingular and b € R”. 
Our aim is to examine how perturbations in A and b affect the solution x. 
À much more detailed treatment may be found in Higham (1996). 


2.7.1 An SVD Analysis 
If a 
A= y» = yEvT 
iml 


is the SVD of A, then 
nyt 
L amip T)-1, uib | 
z= A 4 = (UEVT) = Y ve (2.7.1) 


This expansion shows that small changes in A or b can induce relatively 
large changes in z if an is small. 

lt should come as no surprise that the magnitude of 7, should have 
a bearing on the sensitivity of the Az = b problem when we recall from 
Theorem 2.5.3 that an is the distance from A to the set of singular matrices. 
As the matrix of coefficients approaches this set, it is intuitively clear that 
the solution z should be increasingly sensitive to perturbations. 


2.7.2 Condition 


A precise measure of linear system sensitivity can be obtained by consider- 
ing the parameterized system 

(At+teF)z()=b+ef x(0)=2 
where P € IR°“" and f € IR". If A is nonsingular, then it is clear that 2(e) 
is differentiable in a neighborhood of zero. Moreover, (0) = A^!(f— Fx) 
and thus, the Taylor series expansion for z(e) has the form 


z(e) = z + ei(0) + O(e). 
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Using any vector norm and consistent matrix norm we obtain 


ise) - zi a, (Ld ; 
Da cear flr oc. era 


For square matrices A define the condition number «(A) by 
(A) = | All A>} (2.7.3) 


with the convention that (A) = oo for singular A. Using the inequallty 
Holl € LAT | xl it follows from (2.7.2) that 


"ie i zl « (Alpa + ps) +O) (2.7.4) 
where LEN LEN 
=iedyag dE 


represent the relative errors in A and b, respectively. Thus, the relative 
error in x can be «(A) times the relative error in A and b. In this sense, the 
condition number «(A) quantifies the sensitivity of the Az = b problem. 
Note that &(-) depends on the underlying norm and subscripts are used 
accordingly, e.g., 
g (A) 
Snl A) 
Thus, the 2-norm condition of a matrix A measures the elongation of the 
hyperellipsoid {Az : || z {lz = 1}. 
We mention two other characterizations of the condition number. For 
p-norm condition numbers, we have 


Doo NAA 
(A) A+OA singular All, 


(A) = lA Mall A7! fle = (2.7.5) 


(2.7.6) 


This result may be found in Kahan (1966) and shows that «,(A) measures 
the relative p-norm distance from A to the set of singular matrices. 
For any norm, we also have 


«(A)= lim sup (4 +AA) -A 1. (2.7.7) 
e0 aasa = a 


This imposing result merely says that the condition number is a normalized 
Frschet derivative of the map A — A^. Further details may be found in 
Rice (1966b). Recall that we were initially led to (A) through differenti- 
ation, 
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If x(A) is large, then A is said to be an ill-conditioned matrix. Note that 
this is a norm-dependent property?. However, any two condition numbers 
Ka(-) and «g(-) on R"*" are equivalent in that constants c; and c; can be 
found for which 


cima(A) € ng(A) € ean (4) AER ™". 


For example, on IR?*" we have 


Lm) € m(A) < nA) 


Enold) X mA) X nus(4) (2.7.8) 


ELT € n(A) S n?n (A). 
Thus, if a matrix is ill-conditioned in the a-norm, it is ill-conditioned in 
the -norm modulo the constants c, and cz above. 

For any of the p-norms, we have «,(A) > 1. Matrices with small con- 
dition numbers are said to be well-conditioned . In the 2-norm, orthogonal 
matrices are perfectly conditioned in that &2(Q) = 1 if Q is orthogonal. 


2.7.3 Determinants and Nearness to Singularity 


It is natural to consider how well determinant size measures ill-conditioning. 
If det(A) = 0 is equivalent to singularity, is det(A) ~ 0 equivalent to near 
singularity? Unfortunately, there is little correlation between det(A) and 
the condition of Az = b. For example, the matrix B, defined by 


10-1 + -1 
0 1 --1 

Bo=i. . . .]|em"n (2.7.9) 
0 0 tae 1 


bas determinant 1, but &x, (B4) = n277!. On the other hand, a very well 
conditioned matrix can have a very small determinant. For example, 


D, = diag(1071,...,107!) em *^ 
satisfies &(D4) = 1 although det(D,) = 107". 


2.7.4 A Rigorous Norm Bound 


Recall that the derivation of (2.7.4) was valuable because it highlighted the 
connection between &(A) and the rate of change of z(e) at € = 0. However, 


21t also depends upon the definition of “large.” The matter is pursued in $3.5 
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it is a little unsatisfying because it is contingent on ¢ being "small enough" 
and because it sheds no light on the size of the O(e*) term. In this and the 
next subsection we develop some additional Ar = b perturbation theorems 
that are completely rigorous. 

We first establish a useful lemma that indicates in terms of x(A) when 
we can expect a perturbed system to be nonsingular. 


Lemma 2.7.1 Suppose 
Az = b Ac€R'"",04bcm" 
(A+AA)y = b-Ab AALER", Abe? 
with | AA] «el All and Ab] S elbi. Ifen(A) =r <1, then Ac AA 


is nonsingular and 
HEEL 


el) 71-0 


Proof. Since | AAA < «lf AII |] Al] =r « 1 it follows from 
Theorem 2.3.4 that (A + 4A) is nonsingular. Using Lemma 2.3.3 and the 
equality (I + A7! AA)y = z + A^! Ab we find 

yl s +A 547 f (ll eed A7 ! HW) 


letter ation) = cL (rien). 


zi) .o 


Since || bi =] Ac} <All " it follows that 


ivi < << 


We are now set to establish a rigorous Az = b perturbation bound. 
Theorem 2.7.2 Jf the conditions of Lemma 2.7.1 hold, then 


ly—zll 
2.7.10 
jet ii (2.7.10) 
Proof. Since 
y-z = A“ Ab - A AAy (2.7.11) 
we have fy- zi € el A7 Ib] + el A7 ILLA iyi and so 
ly- zl TT ial 
« A + A 
rr € Up tU 
l+r 2e 
< et (1 152) =] 
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Example 2.7.1 The Az = b problem 


1 0 z]o[oo 

0 1075 za | 7 | 107? 
has solution x = (1, 1)7 and condition woo (A) = 105. If Ab = (1075, 0)7, AA 0, 
and (A + AA)y = b + Ab, then y = (1 + 1075 , 1)7 and the inequality (2.7.10) says 
qz -vle | IA il 

lz Ilo [TIS 

Thus, the upper bound in (2.7.10) can be a gross overestimate of the error induced by the 
perturbation, On the other band, if Ab = (0. 1075)7, AA = 0, and (A AA)y = b+ Ab, 
then this inequality says 


1078 = Ke (A) = 1079105 = 1. 


E €2x1079105, 
Thus, there are perturbations for which the bound in (2.7.10) is essentially attained. 


2.7.5 Some Rigorous Componentwise Bounds 


We conclude this section by showing that a more refined perturbation the- 
ory is possible if componentwise perturbation bounds are in effect and if 
we make use of the absolute value notation. 


Theorem 2.7.3 Suppose 
Az = b AER”, 0#beR 


(A+AA)y = b+Ab AAER™", Abe R 


and that [AA] € «|A| and |Ab| < elbl. If 6, (4) =r <1, then (A +AA) 
is nonsingular and 


Ly ~ 2 loo 26 - 
MITES < —]J JATHA . 
ize AMA ke 
Proof. Since || AA {loo < ell A {loo and || Ad loo < eli b oo the conditions of 


Lemma 2.7.1 are satisfied in the infinity norm. This implies that A + AA 
is nonsingular and 


løle  l+r 
zh. ~ 1-r 
Now using (2.7.11) we find 


iy-zi s 14" lA + [AT IA ALI 


S dA" + ATHA iyl < eA" AL nd + dy - 


If we take norms, then 


- ltr 
ly- zl. S e147 Al leo (ici +7741 2 tee) 
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The theorem follows upon division by || x joo. O 


We refer to the quantity || |A~| [Al lioo as the Skeel condition number. It 
has been effectively used in the analysis of several important linear system 
computations. See §3.5. 

Lastly, we report on the results of Oettli and Prager (1964) that indicate 
when an approximate solution £ € IR” to the n-by-n system Az = b satis- 
fies a perturbed system with prescribed structure. In particular, suppose 
E € IRP*" and f € IR" are given and have nonnegative entries. We seek 
^A € R?*^, Ab c IR", and w > 0 such that 


(A+ AAg =b+Ab [AAI SWE, [Ab] X uf. (2.7.12) 


Note that by properly choosing E and f the perturbed system can take on 
certain qualities. For example, if E =|A{ and f = |b| and w is small, then 
$ satisfies a nearby system in the componentwise sense. Oettli and Prager 
(1964) show that for a given A, b, 2, E, and f the smallest w possible in 
(2.7.12) is given by 
mn = max tab 
mn igisa (EIB f). 


If AZ = b then wmm = 0. On the other hand, if wmn = œ, then 2 does 
not satisfy any system of the prescribed perturbation structure. 


Problems 


P2.7.1 Show that if || / | 2 1. then «(A) 2 1. 


P2.7.2 Show that for a given norm, x(AB) $ &(A)«(B) and that x(aA) = x(A) for all 
nonzero a. 
P2.7.3 Relate the 2-norm condition of X c R™*" (m > n) to the 2-norm condition of 


the matrices 
„fim X 
B=[ 0 d. 


e-[X]. 


Notes and References for Sec. 2.7 
The condition concept is thoroughly investigated in 


J. Rice (1966). “A Theory of Condition,” SJAM J. Num. Anai. 3, 287-310. 
W. Kahan (1966). “Numerical Linear Algebra,” Canadian Math. Bull 9, 757-801. 


References for componentwise perturbation theory include 
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W. Oettli and W. Prager (1984). “Compatibility of Approximate Solutions of Linear 
Equations with Given Error Bounds for Coefficients and Right Hand Sides,” Numer. 
Math. 6, 405—409. 

J.E. Cope and B.W. Rust (1979). “Bounds on solutions of systems with accurate data,” 
SIAM J. Num. Anai. i6, 950-63. 

R.D. Skeel (1979). “Scaling for numerical stability in Gaussian Elimination,” J. ACM 
26, 494-526. 

J.W. Demmel (1992). “The Componentwise Distance to the Nearest Singular Matrix,” 
SIAM J. Matriz Anai. Appl. 13, 10-19. 

D.J. Higham and N.J. Higham (1992). “Componentwise Perturbation Theory for Linear 
Systema with Multiple Right-Hand Sides,” Lin. Alg. and its Applic. 174, 111-129. 

N.J. Higham (1994). “A Survey of Componentwise Perturbation Theory in Numerical 
Linear Algebra,” in Mathematics of Computation 1943-1993: A Half Century of 
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Chapter 3 


General Linear Systems 


§3.1 Triangular Systems 

$3.2 The LU Factorization 

$3.3 Roundoff Analysis of Gaussian Elimination 
83.4 Pivoting 

$3.5 Improving and Estimating Accuracy 


The problem of solving a linear system Ar = b is central in scientific 
computation. In this chapter we focus on the method of Gaussian elimi- 
nation, the algorithm of choice when A is square, dense, and unstructured. 
When A does not fall into this category, then the algorithms of Chapters 
4, 5, and 10 are of interest. Some parallel Ar = 6 solvers are discussed in 
Chapter 6. 

: We motivate the method of Gaussian elimination in $3.1 by discussing 
the ease with whlch triangular systems can be solved. The conversion of 
a general system to triangular form via Gauss transformations is then pre- 
sented in $3.2 where the "language" of matrix factorisations is introduced. 
Unfortunately, the derived method behaves very poorly on a nontrivial class 
of problems. Our error analysis in $3.3 pinpoints the difficulty and moti- 
vates $3.4, where the concept of pivoting is introduced. In the final section 
we comment upon the important practical issues associated with scaling, 
iterative improvement, and condition estimation. 


Before You Begin 


Chapter 1, §§2.1-2.5, and $2.7 are assumed, Complementary references 
include Forsythe and Moler (1967), Stewart (1973), Hager (1988), Watkins 
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(1991), Ciarlet (1992), Datta (1995), Higham (1996), Trefethen and Bau 
(1996), and Demmel (1996). Some MATLABfunctions important to this 
chapter are lu, cond, rcond, and the “backslash” operator ^^. LAPACK 
connections include 


Solve AX = B, AT X = B with error bounds 
Solve AX = B, ATX = B 
AT! 


LAPACK: General Linear Systems 
Soive AX — B 
Condition estimate via PA = LU 
Improve AX = B, ATX = B, AP X = B solutions with error bounds 
Solve AX = B, AT X = B, AH X = B with condition estimate 
PA=LU 
Solve AX = B, ATX = B, AHX = B vin PA = LU 
a- 
Equilibration 


3.1 Triangular Systems 
Traditional factorization methods for linear systems involve the conversion 
ofthe given square system to a triangular system that has the same solution. 
This section is about the solution of triangular systems. 
3.1.1 Forward Substitution 
Consider the following 2-by-2 lower triangular system: 
fy 0 ma] hi 
la bog T2 hj 
If £j£2 #0, then the unknowns can be determined sequentially: 


my = bi/en 
tq = (by —nzi)/£a. 


This is the 2-by-2 version of an algorithm known as forward substitution. 
The general procedure is obtained by solving the ith equation in Lx = b 


for Ti: 
i=l 
r = (^ -5 s / fü 
jl 
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If this is evaluated for i = 1:n, then a complete specification of z is obtained. 
Note that at the ith stage the dot product of L(i, 1:: — 1) and z(1:i — 1) is 
required. Since b; only is involved in the formula for z,, the former may be 
overwritten by the latter: 


Algorithm 3.1.1 (Forward Substitution: Row Version) If L c R°** 
is lower triangular and 6 € IR", then this algorithm overwrites b with the 
solution to Lr = b. L is assumed to be nonsingular. 

6(1) = 6(1)/Z(1, 1) 

for i = 2:n 

b(i) = (6(4) = L(i, Li — 1)b(1:i - 1))/L(i,i) 

en 
This algorithm requires n? flops. Note that L is accessed by row. The 
computed solution ĉ satisfies: 


(LeF) =b |F] € null] + O(u?) (3.1.1) 


For a proof, see Higham (1996). It says that the computed solution exactly 
satisfies a slightly perturbed system. Moreover, each entry in the perturbing 
matrix F is small relative to the corresponding element of L. 


3.1.2 Back Substitution 


The analogous algorithm for upper triangular systems Ur = b is called 
back-substitution. The recipe for x, is prescribed by 


n 
T: = |b- X ugr; [x 
jeinl 


and once again b; can be overwritten by z;. 


Algorithm 3.1.2 (Back Substitution: Row Version) If U € R"* 
is upper triangular and b € IR”, then the following algorithm overwrites b 
with the solution to Ux = b. U is assumed to be nonsingular. 

Kn) = = b(n)/U(n, n) 

ri-n-l:-Ll 
b(1) = (b(i) — U (i, i + 1:n)b(i + l:n))/U (i, i) 

en 
This algorithm requires n? flops and accesses U by row. The computed 
solution 2 obtained by the algorithm can be shown to satisfy 


(U +F) =b |F] < nulU] + O(u?). (3.1.2) 
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3.1.3 Column Oriented Versions 


Column oriented versions of the above procedures can be obtained by re- 
versing loop orders. To understand what this means from the algebraic 
point of view, consider forward substitution. Once zr, is resolved, it can 
be removed from equations 2 through n and we proceed with the reduced 
system L(2:n,2:n)z(2:n) = b(2:n) —x(1)L(2:n, 1). We then compute x2 and 
remove it from equations 3 through n, etc. Thus, if this approach is applied 


200 zi 6 
15 90 r3 = 2 
79 8 Z3 5 


we find x; = 3 and then deal with the 2-by-2 system 


[5 |[2]-[s]-s[]- [as] 


Here is the complete procedure with overwriting. 


Algorithm 3.1.3 (Forward Substitution: Column Version) If L e R"*" 
is lower triangular and b € R”, then this algorithm overwrites b with the 
solution to Lr =b. L is assumed to be nonsingular. 


for j-Ln-1 

bj) = &G)/L0,3) 

b(j + Lin) = b(j + En) - 6(7) L(j + Ln, 7) 
end 


b(n) = b(n)/L(n,n) 


It is also possible to obtain a column-oriented saxpy procedure for back- 
substitution. 


Algorithm 3.1.4 (Back Substitution: Column Version) If U e R"*^ 
is upper triangular and 6 € IR", then this algorithm overwrites b with the 
solution to Ur = b. U is assumed to be nonsingular. 


for j =n: — 1:2 

b(j) = 86)/0U0.3) 

b(1:3 - 1) = b(1j - 1) -BGU (Lj - 1,3) 
end 


b(1) = &(1)/U(1,1) 


Note that the dominant operation in both Algorithms 3.1.3 and 3.1.4 is 
the saxpy operation. The roundoff behavior of these saxpy implementations 
is essentíally the same as for the dot product versions. 

The accuracy of a computed solution to a triangular system is often 
surprisingly good. See Higham (1996). 
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3.1.4 Multiple Right Hand Sides 


Consider the problem of computing a solution X € R*** to LX = B where 
Le R* is lower triangular and Be IR?**, This is the multiple right 
hand side forward substitution problem. We show that such a problem 
can be solved by a block algorithm that is rich in matrix multiplication 
assuming that q and n are large enough. This turns out to be important in 
subsequent sections where various block factorization schemes are discussed. 
We mention that although we are considering here just the lower triangular 
problem, everything we say applies to the upper triangular case as well. 

To develop a block forward substitution algorithm we partition the equa- 
tion LX = B as follows: 


Li 0 0 X, B 
In La c 0 Xa Bi 

Doe d] ood. (3.1.3) 
Ln, Lm co Lyn Xn By 


Assume that the diagonal blocks are square. Paralleling the development of 
Algorithm 3.1.3, we solve the system £3:X; = Bi for X1 and then remove 
X; from block equations 2 through N: 


Im 0 e 0 Xa By - La Xa 
La La o 00 X; Bs - La Xi 
Lyne Lya co Lww Xn By — LyiXi 


Continuing in this way we obtain the following block saxpy forward elimi- 
nation scheme: 
for 7 =1:N 
Solve L4, X; = B; 
for i =f +1:N (3.1.4) 
Bi = B; — LX; 
end 
end 


Notice that the i-loop oversees a single block saxpy update of the form 
Biri Bja Lag 
: = :; 7] o: [Xe 

By By ing 
For this to be handled as a matrix multiplication in a given architec- 
ture it is clear that the blocking in (3.1.3) must give sufficiently “big” 
X;. Let us assume that this is the case if each X; has at least r rows. 
This can be accomplished if N = ceil(n/r) and X;,..., Xy; € R”? and 
Xy € RON Deda, 


92 CHAPTER 3. GENERAL LINEAR SYSTEMS 


3.1.5 The Level-3 Fraction 


It is handy to adopt a measure that quantifies the amount of matrix multi- 
plication in a given algorithm, To this end we define the levei-3 fraction of 
an algorithm to be the fraction of flops that occur in the context of matrix 
multiplication. We call such flops level-3 flops. 

Let us determine the level-3 fraction for (3.1.4) with the simplifying 
assumption that n = rN. (The same conclusions hold with the unequal 
blocking described above.) Because there are N applications of r-by-r 
forward elimination (the level-2 portion of the computation) and n? flops 
overall, the level-3 fraction is approximately given by 


Thus, for large N almost all flops are level-3 flops and it makes sense to 
choose N as large as possible subject to the constraint that the underlying 
architect ure can achieve a high level of performance when processing block 
saxpy's of width at least r = n/N. 


3.1.6 Non-square Triangular System Solving 


The problem of solving nonsquare, m-by-n triangular systems deserves some 
mention. Consider first the lower triangular case when m > n, i.e., 


[2]: [4] Lucem b, em? 
La ~ [be La eR" m^ negem 


Assume that £1, is lower triangular, and nonsingular. If we apply forward 
elimination to L;,x = b, then z solves the system provided Lo:(Z{/b1) = 
ba. Otherwise, there is no solution to the overall system. In such a case 
least squares minimization may be appropriate. See Chapter 5. 

Now consider the lower triangular system Lr = b when the number 
of columns n exceeds the number of rows m. In this case apply forward 
substitution to the square system L(1:m, 1:m)z(1:m, l:m) = b and prescribe 
an arbitrary value for z(m + l:n). See $5.7 for additional comments on 
systems that have more unknowns than equations, 

The handling of nonsquare upper triangular systems is similar. Details 
are left to the reader. 


3.1.7 Unit Triangular Systems 


A unit triangular matrix is a triangular matrix with ones on the diagonal. 
Many of the triangular matrix computations that follow have this added 
bit of structure. It clearly poses no difficulty in the above procedures. 
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3.1.8 The Algebra of Triangular Matrices 


For future reference we list a few properties about products and inyerses of 
triangular and unit triangular matrices. 
o The inverse of an upper (lower) triangular matrix is upper (lower) 
triangular. 
e The product of two upper (lower) triangular matrices is upper (lower) 
triangular. 
* The inverse of a unit upper (lower) triangular matrix is unit upper 
(lower) triangular. 


e The product of two unit upper (lower) triangular matrices is unit 
upper (lower) triangular. 


Problems 
P3.1.1 Give an algorithm for computing a nonzero z € R" such that Uz = 0 where 
U € R^*" is upper triangular with unn = 0 and u11:: + tin—1,n-1 X 0. 


P3.1.2 Discuss how the determinant of a square triangular matrix conid be computed 
with minimum risk of overfiow and underflow. 


P3.1.3 Rewrite Algorithm 3.1.4 given that U is stored by column ín a length n(n +4 1)/2 
array u.vec. 


P3.1.4 Write a detalled version of (3.1.4). Do not assume that N divides n. 
P3.1.5 Prove all the facts about triangular matrices that are listed in $3.1.8. 


P3.1.8 Suppose S,T € R^ X" are upper triangular and that (ST — 4/)z = b is a non- 
singular system. Give an O(n?) algorithm for computing z. Note that the explicit 
formation of ST — Af requires O(n?) flops. Hint, Suppose 


c ut T vwT B 
s-[: E] 2-[5 Z] [E] 
where $4. = S(k—1:n, k- Ln), T. = T(k- Xn, k- 1m), by = bk- in) and a,r, f ER. 
Show that if we heve a vector x, such that ^ 
(S.T; - M)ze = be 
and we = Teze is available, then 
T 
.[* a BIY ze te 
m+ = [ Te ] TT 9T-A 
solves ($47, — Af)z, = 64. Observe that zy and wy = T'iz, each require O(n — k) 
flops. 
P3.1.7 Suppose the matrices R;,..., Rp € R?*" are all upper triangular. Give an 
O(m?) algorithm for solving the system (Ry --- Ry — Af)z = b assuming that the matrix 
of coefficients is nonsingular, Hint. Generalize the solution to the previous problem. 
Notes and References for Sec. 3.2 
The accuracy of triangular system solvers is analyzed in 


N.J. Higham (1989). “The Accuracy of Solutions to Triangular Systema," S/AM J. Num. 
Anai. 26, 1252-1265. 
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3.2 The LU Factorization 


As we have just seen, triangular systems are “easy” to solve. The idea 
behind Gaussian elimination is to convert a given system Ar = b to an 
equivalent triangular system. The conversion is achieved by taking appro- 
priate linear combinations of the equations. For example, in the system 
ax, +5r2 = 9 
621+ 723 = 4 
if we multiply the first equation by 2 and subtract it from the second we 
obtain 


Li 
© 


32; +522 
-3rg = -14 


This is n = 2 Gaussian elimination. Our objective in this section is to give 
a complete specification of this central procedure and to describe what it 
does in the language of matrix factorizations. This means showing that 
the algorithm computes a unit lower triangular matrix L and an upper 
triangular matrix U so that A = LU, eg., 


5 - pop 4] 


The solution to the original Az = b problem is then found by a two step 
triangular solve process: 


fy=6, Ur=y = Ar = LUr = Ly =b. 


The LU factorization is a “high-level” algebraic description of Gaussian 
elimination. Expressing the outcome of a matrix algorithm in the “lan- 
guage” of matrix factorizations is a worthwhile activity. It facilitates gen- 
eralization and highlights connections between algorithms that may appear 
very different at the scalar level. 


3.2.1 Gauss Transformations 


To obtain a factorization description of Gaussian elimination we need a 
matrix description of the zeroing process. At the n = 2 level if z4 # 0 and 


T = 29/21, then 
[5 Hz] - [9] 


More generally, suppose z c R” with zx X 0. If 


Xi . 
Em agno n= z i=k+ln 
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and we define 
My — I — rel, (3.2.1) 
then 
1 0 0 -. 0 zi Zi 
_ 1 0 0 Tk _ | te 
Mz= |o -nyi 1 o|] z| |0 
Q «++ -mn 0-1 Za 0 


In general, a matrix of the form Mi = I -reg € R'*" is a Gauss trans- 
formation if the first k components of r € EC are zero. Such a matrix is 
unit lower triangular. The components of r(k + l:n) are called multipliers. 
The vector 7 is called the Gauss vector. 


3.2.2 Applying Gauss Transformations 


Multiplication by a Gauss transformation is particularly simple. If C € R"*" 
and M, = I — ref is a Gauss transform, then 


MC = (I-rel)C = C- «(el C) = C—1C(k,:). 


is an outer product update. Since r(1:k) = 0 only C(k + 1:n,:) is affected 
and the update C = M,C can be computed row-by-row as follows: 
for i= k+ lin 
Chi) = CG) -nC (k, :) 
end 


This computation requires 2(n — 1)r flops. 


0 14 7 
T= 1 | > Ure) = 1 1j. 
-1 10 17 


3.2.3 Roundoff Properties of Gauss Transforms 


If ê is the computed version of an exact Gauss vector 7, then it is easy to 
verify that 


Example 3.2.1 


14 7 
C=|2 5 8 
3 6 10 


eo 


fre lel € uil. 
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If ? is used in a Gauss transform update and fi((I — ?eT)C) denotes the 
computed result, then 


fia -fef)c) = (I -reP)C+E, 
where 
IEI € 3u(IC] + IrIIC(k, :)])  O(u?). 


Clearly, if 7 has large components, then the errors in the update may be 
large in comparison to |C|. For this reason, care must be exercised when 
Gauss transformations are employed, a matter that is pursued in 83.4. 


3.2.4 Upper Triangularizing 


Assume that A € IR? *^, Gauss transformations M;,..., M4 , can usually 
be found such that M, ,--- MM, A = U is upper triangular. To see this 
we first look at the n = 3 case. Suppose 


14 7 
A=/2 5 8|. 
3 6 10 


If 
100 
Mı = -2 104, 
-3. 0 1 
then 
1 4 T 
MA=|0 -3 -6 
0 -6 -11 
Likewise, 
1 00 1 4 7 
M2 = |0 10 => M2(M,A) = | 0 -3 -6 
0 —2 1 0 1 


Extrapolating from this example observe that during the kth step 


e We are confronted with a matrix A^ = M, 4... M;A that is 
upper triangular in columns 1 to k — 1. 

* The multipliers in My are based on AGCD( + l:n, k). In particular, 
we need ak) # 0 to proceed. 


Noting that complete upper triangularization is achieved after n — 1 steps 
we therefore obtain 
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k=1 

while (A(k, k) # 0) & (k € n~1) 
t(k+ lin) = A(k + Lin, k)/A(k, k) (3.2.2) 
A(k + lin, :) = A(k + lin, :) — r(k + Lin) A(k, 2) 
k=k+1 

end 


The entry A(k, k) must be checked to avoid a zero divide. These quantities 
are referred to as the pivots and their relative magnitude turns out to be 
critically important. 


3.2.5 The LU Factorization 


In matrix language, if (3.2.2) terminates with k — n, then it computes 
Gauss transforms M;i,..., Mn—ı such that M,1::-MiA = U is upper 
triangular. It is easy to check that if My = I — rU eT, then its inverse is 
prescribed by M, 2l Ter and so 


A-LU (3.2.3) 


where 
L = Mī- M; (3.2.4) 


It is clear that L is a unit lower triangular matrix because each M, ` is unit 
lower triangular. The factorisation (3.2.3) is called the LU factorization of 
A. 

As suggested by the need to check for zero pivots in (3.2.2), the LU 
factorization need not exist. For example, it is impossible to find k; and 


Uus SO 
1 2 3 1 0 0 ua $i tus 
247] = fn 1 0 0 un tas |. 
3.523 £31 £33 1 0 Q u33 


To see this equate entries and observe that we must have ui; = 1, u12 = 2, 
ła = 2, un = 0, and £3 = 3. But when we then look at the (3,2) entry 
we obtain the contradictory equation 5 = £31112 + 33533 = 6. 

As we now show, a zero pivot in (3.2.2) can be identified with a singular 
leading principal submatrix. 
Theorem 3.2.1 A € R'"" has an LU factorization if det(A(1:k, 1:k)) X 0 
fork 2 1: — 1. If the LU factorization exists and A is nonsingular, then 
the LU factorization is unique and det(A) = ur, +*+ tinn- 


Proof. Suppose k—1 steps in (3.2.2) have been executed. At the beginning 
of step k the matrix A has been overwritten by My ,--- MiA = AF-D, 
Note that a? is the kth pivot. Since the Gauss transformations are 
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unit lower triangular it follows by looking at the leading k-by-k portion of 
this equation that det(A(L:k, 1:k)) = afk"... a4"), Thus, if A(1:k, 1:k) 
is nonsingular then the kth pivot is nonzero. 

As for uniqueness, if A = LU, and A = [43 are two LU factorizations 
of a nonsingular A, then L;'Li = U3U, . Since L;'L, is unit lower 
triangular and U2U;' is upper triangular, it follows that both of these 
matrices must eqnal the identity. Hence, Lı = La and V; = U2. 

Finally, if A = LU then det(A) =  det(LU) = det(L)det(U) = 
det(U) = ptt an D 


3.2.6 Some Practical Details 


From the practical point of view there are several improvements that can 
be made to (3.2.2). First, because zeros have already been introduced in 
columns 1 through k — 1, the Gauss transform update need only be applied 
to columns k through n. Of course, we need not even apply the kth Gauss 
transform to A(:, k) since we know the result. So the efficient thing to do 
is simply to update A(k+ len, k + l:n). Another worthwhile observation is 
that the multipliers associated with M, can be stored in the locations that 
they zero, i.e., A(k + lin, k). With these changes we obtain the following 
version of (3.2.2): 


Algorithm 3.2.1 (Outer Product Gaussian Elimination) Suppose 
A € R?*" has the property that A(1:k, 1:k) is nonsingular for k = L:n — 1. 
This algorithm computes the factorization Ma. .1--- Mi A = U where U is 
upper triangular and each M, is a Gauss transform. U is stored in the 
upper triangle of A. The multipliers associated with M, are stored in 
Alk + lin, k), Le., A(k + En, k)  -My(k + Ln, k). 


for k= In—-1 

rows = k+ 1:n 

A(rous, k) = A(rows, k)/ A(k, k) 

A(rows,rows) = A(rows,rows) — A(rows, k) A(k, rows) 
end 


This algorithm involves 2n°/3 flops and it is one of several formulations of 
Gaussian Elimination. Note that each pass through the k-loop involves an 
outer product. 


3.2.7 Where is L? 
Algorithm 3.2.3 represents L in terms of the multipliers. In particular, if 


r{*) is the vector of multipliers associated with Mx then upon termination, 
A(k + l:n, k) = 7, One of the more happy “coincidences” in matrix 
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computations is that if L = M°- - Ma1, then L(k + lin, k) = 708. This 
follows from a careful look at the product that defines L. Indeed, 


L= (7 +7 ef) UD (r+ rece) =f n3. . 
k=l 


Since A(k + 1:n, k) houses the kth vector of multipliers r9 , it follows that 
A(i, k) houses Zj, for all i > k. 
3.2.8 Solving a Linear System 


Once A has been factored via Algorithm 3.2.1, then L and U are represented 
in the array A, We can then solve the system Az = b via the triangular 
systems Ly = b and Ux = y by using the methods of §3.1. 


Example 3.2.2 1f Algorithm 3.2.1 is applied to 


14 7 100 1. 4 7 
A-2|25 s/=]2 10 0 -3 -6], 
3 6 10 32 1 0 a 1 
then upon completion, 
1 4 7 
A-2|2 -3 -6 
3 2 1 
If b = (1,1, 1)7, then y = (1, 71,0)7. solves Ly = b and z = (—1/3,1/3,0)7 solves 


Uz-y. 


3.2.9 Other Versions 


Gaussian elimination, like matrix multiplication, is a triple-loop procedure 
that can be arranged in several ways. Algorithm 3.2.1 corresponds to the 
"kij" version of Gaussian elimination if we compute the outer product 
update raw-by-row: 


fork-Ln-1 
A(k + lin, k) = A(k + Lin, k)/ A(, k) 
for i= k+ lin 
for j=k+ in 
Ali, 3) = A(53) - A(& k)A(k, j) 
end 
end 
end 


There are five other versions: kji, ik7, ijk, jik, and jki. The last of these 
results in an implementation that features a sequence of gaxpy’s and for- 
ward eliminations. In this formulation, the Gauss transformations are not 
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immediately applied to A as they are in the outer product version. Instead, 
their application is delayed. The original A(:, 7) is untouched until step j. 
At that point in the algorithm A(:, j) is overwritten by Mj.1---M1A(;,J). 
The jth Gauss transformation is then computed. 

To be precise, suppose 1 < j < n — 1 and assume that L(:, 1:7 — 1) 
and U(1:j — 1, 1:j — 1) are known. This means that the first j — 1 columns 
of L and U are available. To get the jth columns of L and U we equate 
jth columns in the equation A = LU: A(:,j) = LU(:,j). From this we 
conclude that 


A(sj-1j) = L(Gj - 1, 1:7 - 1)U (Lj - 1,7) 
and ; 
i 
Alin, j) = S EG k)U (k, j) . 
kei 


The first equation is a lower triangular system that can be solved for the 
vector U (1: — 1, j). Once this is accomplished, the second equation can be 
rearranged to produce recipes for U(j, j) and L(j + L:n, j). Indeed, if we 
set 


j-1 


Alin j) - 9 ^ Lin, kU (k, j) 


k=l 
Alin, j) ~ Lin, Lj - NU (LJ - 1,3), 


then L(j + 1:1,3) = v(7 + 1:n)/v(7) and U(j, j) = v(3). Thus, L(j + Ln, 3) 
is a scaled gaxpy and we obtain 


v(j:n) 


L=; U=0 
for j = Ln 
ifj=1 


v(j:n) = A(f:n, j) 
el 
Solve L(1:j ~ 1, Lj —- 1)z = A(1 — 1,j) for z (3.2.5) 
and set U(1:j — 1,7) = 2. 
v(j:n) = A(j:n, 7) — L(m, 1:j — 1)z 
end 
ifj<n 
L(j + Lin, 7) = v(j + Lin)/v(z) 
end 
U(,j) = v) 


end 


This arrangement of Gaussian elimination is rich in forward eliminations 
and gaxpy operations and, like Algorithm 3.2.1, requires 2n3/3 flops. 
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3.2.10 Block LU 


It is possible to organize Gaussian elimination so that matrix multiplication 
becomes the dominant operation. The key to the derivation of this block 
procedure is to partition A € R"*” as follows 


A= An An r 
An An | n-r 


T n—r 


where r is a blocking parameter. Suppose we compute the LU factorization 
Lu Uii = Ai and then solve the multiple right hand side triangular systems 
LiuiUi = Ar and Labi = An for Ui; and Ly respectively. It follows 


that 

An Ag] _ [Lu el[rí o][Ui Um 

An An La da. aA O da. 
where A = Azz — LaiWi2. The matrix A is the Schur complement of Ai 
with respect to A. Note that if A = Lgel/o2 is the LU factorization of A, 


then 

An Aw] _ [fu 0 oo Ui Ui 

An An Ln Ln 0 A 0 Us 
is the LU factorization of A. Thus, after Dy, Di, Uri and U32, are com- 
puted, we repeat the process on the level-3 updated (2,2) block A. 


Algorithm 3.2.2 (Block Outer Product LU) Suppose A € IR™" 
and that det(A(1:k, 1:k) is nonzero for k = l:n—1. Assume that r satisfies 
l<r<n. The following algorithm computes A = LU via rank r updates. 
Upon completion, A(t, j) is overwritten with L(i,7) for i > j and A(¢,7) is 
overwritten with U (i,j) if j >t. 


A=1 
while à € n 
u = min(n,A+r—1) 
Use Algorithm 3.2.1 to overwrite A(Xs, Xt) 
with its LU factors LZ and 0. 
Solve LZ = A(Asu, p + L:n) for Z and overwrite 
A(X: u + bn) with Z. 
Solve WỌ = A(js + L:n, Xi) for W and overwrite 
A(u + Ln, ày) with W. 
Alu + En, pu 4 En) = Alu + Lin, + Lin) - WZ 
A=ptl 
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This algorithm involves 2n?/3 flops. 

Recalling the discussion in §3.1.5, let us consider the level-3 fraction 
for this procedure assuming that r is large enough so that the underlying 
computer is able to compute the matrix multiply update A(u + 1:7, u + 
En) = A(u + Lin, + Ln) - WZ at "level-3 speed.” Assume for clarity 
that n =rN. The only flops that are not level-3 flops occur in the context 
of the r-by-r LU factorizations A(A:4,A:4) = LU. Since there are N such 
systems solved in the overall computation, we see that the level-3 fraction 
is given by 

1- N/S 2 
2n3/3 — N° 
Thus, for large N almost all arithmetic takes place in the context of matrix 
multiplication, As we have mentioned, this ensures high performance on a 
wide range of computing environments. 


3.2.11 The LU Factorization of a Rectangular Matrix 


The LU factorization of a rectangular matrix A € R™*" can also be per- 
formed. The m > n case is illustrated by 


pil- Gln] 


123] [10][1 2 3 
456] [41 0 -3 -6 
depicts the m < n situation. The LU factorization of A € KC" *" is guaran- 
teed to exist if A(1:k, I:k) is nonsingular for k = I:min(m, n). 
‘The square LU factorization algorithms above need only minor modifi- 


cation to handle the rectangular case. For example, to handle the m > n 
case we modify Algorithm 3.2.1 as foliows: 


while 


for k = lin 
rows =k+1:m 
A(rows, k) = A(rows, k)/A(k, k) 
ifk<n 
cols =k+1:n 
A(rows, cols) = A(rows, cols) — A(rows, k) A(k, cols) 
end 
end 


This algorithm requires mn? — n3/3 flops. 
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3.2.12 A Note on Failure 


As we know, Gaussian elimination fails unless the first n — 1 principal 
submatrices are nonsingular. This rules out some very simple matrices, 


eB, 
0 1 
A= l 21 

While A has perfect 2-norm condition, it fails to have an LU factorization 
because it has a singular leading principal submatrix. 

Clearly, modifications are necessary if Gaussian elimination is to be 
effectively used in general linear system solving, The error analysis ín the 
following section suggests the needed modifications, 


Problema 


P3.2.1 Suppose the entries of A(«) € FE'**" are continuously differentiable functions of 
the scalar e. Assume that A = A(0) and all its principal suhmstrices are nonsingular. 
Show that for sufficiently small «, the matrix A(c) has an LU factorization A(e) = 
L(e)U (e) and that L(«) and U(c) are both continuously differentiable. 


P3.2.2 Suppose we partition Ac R**"^ 
An A ] 

as [ An An 
where Aj; is r-by-r. Assume that Arı is nonsingular. The matrix S = A12- An AÑ} Arz 
is called the Schur complement of Aj, in A. Show that if Aj, has an LU factorization, 
then after r steps of Algorithm 3.2.1, A(r + En, r + lin) houses S. How could S be 
obtained after r steps of (3.2.5)? 
P3.2.3 Suppose A € R^** has an LU factorization. Show how Az = b can he solved 
without storing the multipliers by computing the LU factorization of the n-by-(n + 1) 
matrix [A b]. 
P3.2.4 Descrihe a variant of Gaussian elimination that introduces zeros into the columns 
of A in the order, n: — 1:2 and which produces the factorization A = UL where U is unit 
upper triangular and L is lower triangular. 
P3.2.5 Matrices in E^*" of the form N(y,k) = I- wl where y € R” are said to 
be Gauss-Jordan transformations. (a) Give a formula for N(y,k)~! assuming it exists. 
(b) Given z € R", under what conditions can y be found eo N(y,k)z = e}? (c) Give 
an algorithm using Gauss-Jordan transformations that overwrites A with A71. What 
conditions on A ensure the success of your algorithm? 
P3.2.8 Extend (3.2.5) eo that it can also handle the case when A has more rows than 
columns. 
P3.2.7 Show how A can he overwritten with L and U in (3.2.5). Organize the three 
loops so that unit stride access prevails, 
P3.2.8 Develop a version of Gaussian elimination in which the innermost of tbe three 
loops oversees a dot product. 
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Notes and References for Sec. 3.2 


Schur complements (P3.2.2) arise in many applications. For a survey of both practical 
and theoretical interest, see 


R.W. Cottle (1974). “Manifestations of the Schur Complement,” Lin. Alg. and Its 
Applic. 8, 189-211. 


Schur complements are known as "Gauss transforms” in some application areas. The 
use of Gauss-Jordan transformations (P3.2.5) is datailed in Fox (1964). See also 


T. Dekker and W. Hoffman (1989). “Rehabilitation of the Gauss-Jordan Algorithm,” 
Numer. Math, 54, 591-599, 


As we mentioned, inner product versions of Gaussian elimination have been known and 
osed for some time. The names of Crout and Doolittle are associnted with these ijk 
techniques. They were popular during the days of desk calculators because there are 
far fewer intermediate results than in Gaussian elimination. These methods still have 
attraction because they can be implemented with accumulated inner products. For re- 
marks along these lines see Fux (1984) as well as Stewart (1973, pp. 131-39). See also: 


G.E. Forsythe (1960). “Crout with Pivoting,” Comm. ACM 3, 507-8. 
W.M. McKeeman (1982). “Crout with Equilibration and Iteration,” Comm. ACM. 5, 
553-55. 


Loop orderings and block issues in LU computations are discussed in 


J.J. Dongarra, F.G. Gustavson, and A. Karp (1984). “Implementing Linear Algebra 
Algorithms for Dense Matrices on a Vector Pipeline Machine,” SIAM Review 26, 
91-112. 

J.M. Ortega (1988). "The ijk Forms of Factorization Methods 1: Vector Computers," 
Parallel Computers 7, 135-147. 

D.H. Bailey, K.Lee, and H.D. Simon (1991). "Using Strassen's Algorithm to Accelerate 
the Solution of Linear Systems," J. Supercomputing 4, 357-371. 

J.W. Demmel, N.J. Higham, and R.S. Schreiber (1995). "Stability of Block LU Factor- 
ization,” Numer. Lin. Alg. with Applic. 2, 173-190. 


3.3 Roundoff Analysis of Gaussian Elimina- 
tion 


We now assess the effect of rounding errors when the algorithms in the 
previous two sections are used to solve the linear system Az = b. A much 
more detailed treatment of roundoff error in Gaussian elimination is given 
in Higham (1996). 

Before we proceed with the analysis, it is useful to consider the nearly 
ideal situation in which no roundoff occurs during the entire solution process 
except when A and b are stored. Thus, if fl(b) = b--e and the stored matrix 
FI(A) = A + E is nonsingular, then we are assuming that the computed 
solution £ satisfies 


(A-E)g-—(b5re) I Elo <u Alo lel ulis. (3-3.1) 
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That is, solves a "nearby" system exactly. Moreover, if u&,(4) X 1 
(say), then by using Theorem 2.7.2, it can be shown that 


lz- 2 llo 
If leo 


The bounds (3.3.1) and (3.3.2) are “best possible” norm bounds. No general 
oo-norm error analysis of a linear equation solver that requires the storage of 
A and b can render sharper bounds. As a consequence, we cannot justifiably 
criticize an algorithm for returning an inaccurate £ if A is ill-conditioned 
relative to the machine precision, e.g., UKol A) s: 1. 


S Ausg (A). (3.3.2) 


3.3.1 Errors in the LU Factorization 


Let us see how the error bounds for Gaussian elimination compare with 
the ideal bounds above. We work with the infinity norm for convenience 
and focus our attention on Algorithm 3.2.3, the outer product version. 
The error bounds that we derive also apply to Algorithm 3.2.4, the gaxpy 
formulation. 

Our first. task is to quantify the roundoff errors associated with the 
computed triangular factors. 


Theorem 3.3.1 Assume that A is an n-by-n matriz of floating point num- 
bers. If no zero pivots are encountered during the execution of Algorithm 
3.2.3, then the computed triangular matrices L and U satisfy 


LÜ = A+H (3.3.3) 
|H] € 3(n - 1)u (iai + £1101) + O(ul). (3.3.4) 


Proof. The proof is by induction on n. The theorem obviously holds for 
n= 1. Assume it holds for all (n — 1)-by-(n — 1) floating point matrices. If 


a wl 1 
A= [? s]. 
1-1 


then 2 = fi(v/a) and A, = fl(B — ZwT) are computed in the first step of 
the algorithm. We therefore have 


- 1 Jo} 
= = < u— .3.5 
ea gute Mb sum (3.3.5) 


and 


A, = B-iw™+F |F| < 2u(|B|-|z]w|T) + O(u?). — (3.3.8) 
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The algorithm now proceeds to calculate the LU factorization of Ài. By 
induction, we compute approximate factors £, and Ü, for A, that satisfy 


JA = A TH, (3.3.7) 
|i] S 3(n~2)u(|Ail + |Ë) + Ow). — (6:38) 
Thus, 
itr = a w 
wos fii ABD a: | 


0 
= A+ [I +P 


|= A+H. 
From (3.3.6) it follows that 
lÀi S (1+ 2u) (Bj + jēljwjT) + O(u?), 
and therefore by using (3.3.7) and (3.3.8) we have 
|H, +F] < 30 - Du (IB] + [ëlo MAIO) + Olu). 


Since jaf| < ujv] it is easy to verify that 


JH] < atau {| ll r4 t | B ibi | | lal FH } «oc 


thereby proving the theorem. O 


We mention that if A is m-by-n, then the theorem applies with n in (3.3.4) 
replaced by the smaller of n and m . 


3.3.2 Triangular Solving with Inexact Triangles 


We next; examine the effect of roundoff error when L and Ü are used by the 
triangular system solvers of §3.1. 


Theorem 3.3.2 Let L and Ü be the computed LU factors of the n-byn 
floating point matriz A obtained by either Algorithm 3.2.3 or 3.2.4. Suppose 
the methods of $3.1 vation? to produer the computed solution 9 to Ly = b 
and the computed solution to Ux = 9. Then (A+ E)£ = b with 


JE] Su (3A +5|L)01) + O(n). (3.3.9) 
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Proof. From (3.1.1) and (3.1.2) we have 


(£+F)§ = b FI 


L nu]Z| + O(u*) 
(U+Ge = 9 IG] 


nu|U] + O(u?) 
and thus 
(L+F\(U+G)2 = (EU + FÛ + ÌG + FG)ż b. 
From Theorem 3.3.1 . 
LO =A+H, 
with |H| < 3(n — 1)u(|A| + J£{|0]) + O(u*), and so by defining 
E=H+F0+ic+F¢ 
we find (A + E)z = b. Moreover, 
Ej s B e IF DI + BG] + ou) 
3nu (14! + iub) +2nu (1£1101) +0). n 


l^ 


Were it not for the possibility of a large |£{|U/| term, (3.3.9) would compare 
favorably with the ideal bound in (3.3.1). (The factor n is of no conse- 
quence, cf. the Wilkinson quotation in 82.4.6.) Such a possibility exists, for 
there is nothing in Gaussian elimination to rule out the appearance of small 
pivots. If a small pivot is encountered, then we can expect large numbers 
to be present in Ê and U. 

We stress that small pivots are not necessarily due to ill-conditioning as 
1 à bears out. Thus, Gaussian elimination can give 
arbitrarily poor results, even for well-conditioned problems. The method is 
unstable. ` 

In order to repair this shortcoming of the algorithm, it is necessary to 
introduce row and/or column interchanges during the elimination process 
with the intention of keeping the numbers that arise during the calculation 
suitably bounded. This idea is pursued in the next section. 


the example A = | 


Example 3.3.1 Suppose 8 = 10, t = 3, floating point arithmetic is osed to solve: 
-001 1.00 zi - 1.00 4 
1.00 2.00 I 3.00 |^ 

Applying Gaussian elimination we get 


be [ido 2] o [E ax] 
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-a 
Moreover, 6 [ PEE ] is the bounding matrix in (3.3.4), not a severe overesti- 


mate of |H|. If we go on to solve the problem using the triangular system solvers of $3.1, 
than using the same precision arithmetic we obtain a computed solution 2 = (0,1)7. 
This is in contrast to the exact solution z = (1.002...,.996...)7. 


Problema 


P3.3.1 Show that if we drop the assumption that A is a floating point matrix in 
Theorem 3.3.1, than (3.3.4) holds with the coefficient "3" replaced by “4.” 
P3.3.2 Suppose A is an n-by-n matrix and that È and / are produced by Algorithm 
3.2.1. (a) How many flops are required to compute || |Ż Ùl llo? (b) Show FED < 
(1+ 220) £D] + O(u?). 
P3.3.3 Suppose z = A^!b. Show that if e = z — 2 (the error) and r = b — Az (the 
residual), than 

irl E 

va, $ tell S HAT 

ial "n 
Assume consistency between the matrix and vector norm. 
P3.3.4 Using 2-digit, base 10, floating point arithmetic, compute the LU factorization 
of 


For this example, what is the matrix H in (3.3.3)? 


Notes and References for Sec. 3.3 
The original roundoff analysis of Gaussian elimination appears in 


J.H. Wilkinson (1961). "Error Analysis of Direct Methods of Matrix Inversion,” J. ACM 
&, 281-330. 


Various improvements in the bounds and simplifications in the analysin have occurred 
over the years. See 


B.A. Chartres and J.C. Geuder (1987). “Computable Error Bounds for Direct Solution 
of Linear Equations,” J. ACM 14, 63-71. 

J.K. Reid (1971). “A Note on the Stability of Gaussian Elimination,” J. Inst. Math. 
Applic. 8, 374-75. 

C.C. Paige (1973). "An Error Analysis of a Method for Solving Matrix Equations,” 
Math. Comp. 27, 355-59. 

C. ds Boor and A. Pinkus (1977). “A Backward Error Analysis for Totally Positive 
Linear Systems,” Numer. Math. 27, 485-90. 

H.H. Robertson (1977). "The Accuracy of Error Estimates for Systems of Lineer Aige- 
braic Equations,” J. Inst. Math. Applic. 20, 409-14. 

J.J. Du Cros and N.J. Higham (1992). “Stability of Methods for Matrix Inversion,” IMA 
J. Num. AnaL 12, 1-19. 
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3.4 Pivoting 


The analysis in the previous section shows that we must take steps to ensure 
that no large entries appear in the computed triangular factors L and U. 
The example 


0001 1 1 0][00 1 _ 
a=] 1 1] = | 10.000 dr -oss | = 2e 
correctly identifies the source of the difficulty: relatively small pivots. A 


way out of this difficulty is to interchange rows. In our example, if P is the 
permutation 


then 


1 1 1 0][1 1 
PA = | m 1] = | ooo: JP | = w- 


Now the triangular factors are comprised of acceptahiy small elements. 

In this section we show how to determine a permuted version of A that 
has a reasonably stable LU factorization. There are several ways to do 
this and they each correspond to a different pivoting strategy. We focus 
on partial pivoting and complete pivoting. The efficient implementation 
of these strategies and their properties are discussed. We begin with a 
discussion of permutation matrix manipulation. 


3.4.1 Permutation Matrices 


The stabilizations of Gaussian elimination that are developed in this sec- 
tion involve data movements such as the interchange of two matrix rows. 
In keeping with our desire to describe all computations in “matrix terms,” 
it is necessary to acquire a familiarity with permutation matrices. A per- 
mutation matrix is just the identity with its rows re-ordered, e.g., 


0 


0 
0 
P= 1 
0 


coors 
ooor 


0 
0 
1 


An n-by-n permutation matrix should never be explicitiy stored. It is much 
more efficient to represent a general permutation matrix P with an integer 
n-vector p. One way to do this is to let p(k) be the column index of the 
sole “1” in P's kth row. Thus, p = [1132] is the appropriate encoding of 
the above P. It is also possible to encode P on the basis of where the “1” 
occurs in each column, e.g., p = [2431]. 
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If P is a permutation and A is a matrix, then PA is a row permuted 
version of A and AP is a column permuted version of A. Permutation 
matrices are orthogonal and so if P is a permutation, then P7! = PT, A 
product of permutation matrices is a permutation matrix. 

In this section we are particularly interested in interchange permuta- 
tions. These are permutations obtained by merely swapping two rows in 
the identity, e.g., 


E- 


eooo 
ooro 
oroa 
ooor 


Interchange permutations can be used to describe row and column swap- 
ping. With the above 4by-4 example, EA is A with rows 1 and 4 inter- 
changed. Likewise, AE is A with columns 1 and 4 swapped. 

If P = E,-..E, and each E, is the identity with rows k and p(k) 
interchanged, then p(1:n) is a useful vector encoding of P. Indeed, x € IR? 
can be overwritten by Pz as follows: 


for k =1:n 


x(k) = z(p(k)) 


en 


Here, the “=” notation means “swap contents,” Since each E, is symmetric 
and PT = E, ... Ey, the representation can also be used to overwrite z with 
PT x, 


for k =n: —1:1 
z(k) + x(p(k)) 


end 


It should be noted that no floating point arithmetic is involved in a permu- 
tation operation. However, permutation matrix operations often involve the 
irregular movement of data and can represent a significant computational 
overhead. 


3.4.2 Partial Pivoting: The Basic Idea 


We show how interchange permutations can be used in LU computations to 
guarantee that no multiplier is greater than one in absolute value. Suppose 


3 17 10 
A=|2 4 -+2]. 
6 18 —12 
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To get the smallest possible multipliers in the first Gauss transform using 
row interchanges we need a4; to be the largest entry in the first column. 
Thus, if E, is the interchange permutation 


then 
6 18 -12 
EA-2|2 4 -2 
3 17 10 
and 
1 00 6 18 -12 
M, = | -13 1 0 = ME\A = | 0 -2 2|. 
-1/201 0 8 16 


Now to get the smallest possible multiplier in Mz we need to swap rows 2 
and 3, Thus, if 


100 1 0 0 
Ej-21001 and M20 1 0 
010 0 1/4 1 


MyEMj;E)IA = |O 8 16 
0 0 6 


The example illustrates the basic idea behind the row interchanges. In 
general we have: 


then 


for k 2 1n -1 
Determine an interchange matrix E, with E,(1:k, Lk) = I. 
such that if z is the kth column of ELA, then 
lz(k)| = II zm) llo. 
A=E,A 
Determine the Gauss transform M, such that if v is the 
kth column of M,A, then v(k + L:n) 2 0. 
A- M&A 
end 


This particular row interchange strategy is called partial pivoting. Upon 
completion we emerge with Ma 4 Es 1 --- Mi ELA = U, an upper triangu- 
lar matrix. 
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As a consequence of the partial pivoting, no multiplier is larger than 
one in absolute value. This is because 


K(E&My-i-- MiB. A)ee| = max |(ExMr-1:-: M1 E1 A)ixl 
kSi£n 


for k = 1:n — 1. Thus, partial pivoting effectively guards against arbitrarily 
large multipliers. 


3.4.3 Partial Pivoting Details 


We are now set to detall the overall Gaussian Elimination with partial piv- 
oting algorithm. 


Algorithm 3.4.1 (Gauss Elimination with Partial PivotIng) If 
A € R?*^, then this algorithm computes Gauss transforms Mi,- My-1 
and interchange permutations E;,--- E, 1 such that M, ,Es 1 -- Mi EA 
= U is upper triangular, No multiplier is bigger than 1 in absolute value. 
A(L:k, k) is overwritten by U(1:k, k), k = lin. A(k + 1:n, k) is overwritten 
by —M,(k + link), k = ın — 1. The integer vector p(1:n — 1) defines 
the interchange permutations. In particular, E, interchanges rows k and 
p(k), k 21m-1 


for k=1:n-1 
Determine js with k < y < n so |A(u,K)| =f] A(E:n,) lloo 
A(k, k:n) e A(u, k:n) 
p(k) = n 
if A(k,k) #0 
rows =k+1in 
A(rows,k) = A(rows, k)/ A(k, k) 
A(rows, rows) = A(rows, rows) — A(rows,k)A(k, rows) 
end 
end 


Note that if || A(k:n, k) lao = 0 in step k, then in exact arithmetic the first 
k columns of A are linearly dependent. In contrast to Algorithm 3.2.1, this 
poses no difficulty, We merely skip over the zero pivot. 

The overhead associated with partial pivoting is minimal from the stand- 
point of floating point arithmstic as there are only O(n?) comparisons asso- 
ciated with the search for the pivots, The overall algorithm involves 2n3/3 
flops. 
To solve the linear system Az = b after invoking Algorithm 3.4.1 we 


* Compute y = Ma ,Es4 1: Mi Ey. 
* Solve the upper triangular system Uz — y. 
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All the information necessary to do this is contained in the array A and the 
pivot vector p. Indeed, the calculation 


fok-2Ln-1 

b(k) = 

b(k + Ln) = Ds 1:n) ~ b(K) A(k + L:n, k) 
end 


overwrites b with Mn-1En-1 °- MEA. 


Example 3.4.2 lf Algorithm 3.4.1 is applied to 


then upon exit 
6 18 -12 
A- 1 ys 8 16 
i2 -1/4 6 


and p = [3, 3]. These two quantities encode all the information associated with the 
reduction: 


1 00 100 100 00 1 6 18 -12 
0 10 0 0 1 -M3 1 0 01 oļja=]o a |. 
0 1/4 1 010 -1/2 0 1 100 0 0 6 


3.414 Where is L? 
Gaussian elimination with partial pivoting computes the LU factorization of 
a row permuted version of A. The proof is a messy subscripting argument. 


Theorem 3.4.1 Jf Gaussian elimination with partial pivoting is used to 
compute the upper triangularization 


Mn-1En-1 ° MEA = U (3.4.1) 
via Algorithm 9.4.1, then 
PA= LU 


where P = En-1-- E, and L is a unit lower triangular matriz with |£;;| < 
1. The kth column of L below the diagonal is a permuted version of the 
kth Gauss vector. In particular, if My = I — re, then L(k + En, k) = 
g(k + 1:n) where g = En-1 Eggit™. 


Proof. A manipulation of (3.4.1) reveals that M,_1--- MPA = U where 
Ma-1 = My-1 and 


My = Esa Eggi Me Basi s Esa k<n-2. 
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Since each E; is an interchange permutation involving row j and a TOW jt 
with > j we have Ej(1:j - 1, 1:7 — 1) = Jj-1 . It follows that each M, is 
a Gauss transform with Gauss vector 7) = E,-1--- Exyit® 0 


As a consequence of the theorem, it is easy to see how to change Algorithm 
3.4.1 so that upon completion, A(i,j) houses L(i,j) for all i > j. We 
merely apply each E, to all the previously computed Gauss vectors, This 
is accomplished by changing the line “A(k, k:n) + A(y,k:n)” in Algorithm 
3.4.1 to “A(k, 1:n) e A(y, 1:m)." 


Example 3.4.2 The factorization PA = LU of the matrix in Example 3,4.1 is given by 


001 3 i7. 10 1 0 0 6 18 -12 
100 2 4 -2|-5[| 1/2 10 0 8 mj. 
010 6 18 -12 1/3 -1/4 1 0 0 6 


3.4.5 The Gaxpy Version 


In $3.2 we developed outer product and gaxpy schemes for computing the 
LU factorization. Having just incorporated pivoting in the outer product 
version, it is natural to do the same with the gaxpy approach. Recall from 
(3.2.5) the general structure of the gaxpy LU process: 


L=I 
U =0 
for j = 1:n 
ifjzl1 
v(j:n) = Alin, j) 
else 
Solve L(1:j ~ 1, 1:7 - 1)z = A(1:j - 1, j) for z 
and set U(1:j - 1,7) = z. 
v(j:n) = A(jin, j) - L:n, 1:j - 1)z 
end 
ifj«n 
L(jctlm,j)2v(-Ln)»sQ) 
end 
U(,7) = v) 
end 


With partial pivoting we search |v(j:n)| for its maximal element and pro- 
ceed accordingly. Assuming A in nonsingular so no zero pivots are encoun- 
tered we obtain 
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L=I,v=0 
for j = l:n 
ifj=1 
v(j:n) = A(J:n, 7) 
else 
Solve L(1: — 1, 1:7 - 1)z = A(1: - 1,7) 
for z and set U(1:j ~ 1,7) = z. 
v(j:n) = A(m,j) - LG, 1: — 1)z 
end (3.4.2) 
fj<n 
Determine u with k € y € n so |v(u)| = || v(:n) II. 
xj)- 
v(j) e v(u) 
AQ, j + lin) e Alm j + En) 
L(j + Lin, j) = v(3 + 1:n)/v(7) 
ifj21 
LG, 1j — 1) e L(a, 1:3 - 1) 
end 
end 
U(,3) = v() 
end 


In this implementation, we emerge with the factorization PA — LU where 
P=E,_,---E where E, is obtained by interchanging rows k and p(k) of 
the n-by-n identity. As with Algorithm 3.4.1, this procedure requires 2n? /3 
flops and O(n?) comparisons. 


3.4.6 Error Analysis 


We now examine the stability that is obtained with partial pivoting. This 
requires an accounting of the rounding errors that are sustained during 
elimination and during the triangular system solving. Bearing in mind 
that there are no rounding errors associated with permutation, it is not 
hard to show using Theorem 3.3.2 that the computed solution £ satisfies 
(A+E)z = b where 


|E| < nu (sii + SPTIEIOT) + Olu). (3.4.3) 
Here we are assuming that P, L, and Ü are the computed analogs of P, 
L, and U as produced by the above algorithms. Pivoting implies that the 
elements of L are bounded by one. Thus || L|, X n and we obtain the 
bound 


lE fæ < nu (3l Alo + 57l Ô lea) + O(u?). (3.4.4) 
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The problem now is to bound || Ü ||... Define the growth factor p by 


lay | (3.4.5) 
max = n 4, 
ije d Alle 


where A(*) is the computed version of the matrix A(*) = M,E,--- M1 Ey A. 
It follows that 

IE llo < 8n% pl] Allou + O(u?). (3.4.6) 
Whether or not this compares favorably with the ideal bound (3.3.1) himges 
upon the size of the growth factor of p. (The factor n? is not an operating 
factor in practice and may be ignored in this discussion.) The growth factor 
measures how large the numbers become during the process of elimination. 
In practice, p is usually of order 10 but it can also be as large as 2"—', De- 
spite this, most numerical analysts regard the occurrence of serious element 
growth in Gaussian elimination with partial pivoting as highiy unlikely in 
practice. The method can be used with confidence. 


Example 3.4.3 If Gaussian elimination with partial pivoting is applied to the probiem 
.001 1.00 zı ] [10 
1.00 2.00 x2] | 3.00 

with 3 = 10,t = 3, floating point arithmetic, then 


0 0] ; fn 0] , [10 20 
z= |? aj że 001 e] ù=] 0 100 


and $ = (1.00, .996)7. Compare with Example 3.3.1. 


Example 3.4.2 If A € R**" is defined by 
1 ifi=jorj=n 
aj = -1 ifi>j 
. 0 otherwise 
then A has an LU factorization with |Z/;;| € 1 and uas = 2^7!. 


3.4.7 Block Gaussian Elimination 


Gaussian Elimination with partial pivoting can be organized so that it is 
rich in level-3 operations. We detail a block outer product procedure but 
block gaxpy and block dot product formulations are also possible. See 
Dayde and Duff (1988). 

Assume A € R”™” and for clarity that n = rN, Partition A as follows: 
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The first step in the block reduction is typical and proceeds as follows: 


e Use scalar Gaussian elimination with partial pivoting (e.g. a rectan- 
gular version of Algorithm 3.4.1) to compute permutation P, € K**", 
unit lower triangular L1; € R'** and upper triangular U1; € R'*" so 


Au | _ [£n 
pl 4 | 7 ace 


e Apply the P, across the rest of A: 


An] _ An 
Arn 1i Anj’ 
e Solve the lower triangular multiple right hand side problem 
LuUn = Ap. 
* Perform the level-3 update 
A = Ay ~ Lab. 


With these computations we obtain the factorization 
_f fu olf O}f Un Un 
n tee’ ilU e]: 


The process is then repeated on the first r columns of A. 

In general, during step k (1 € k € N — 1) of the block algorithm we 
apply scalar Gaussian elimination to a matrix of size (n — (k — 1)r)-by-r. 
An r-by-(n — kr) multiple right hand side system is solved and a level 3 
update of size (n — kr)-by-(n — kr) is performed. The level 3 fraction for 
the overall process is approximately given by 1 — 3/(2N). Thus, for large 
N the procedure is rich in matrix multiplication. 


3.4.8 Complete Pivoting 


Another pivot strategy called complete pivoting bas the property that the 
associated growth factor bound is considerably smaller than 2"~'. Recall 
that in partial pivoting, the kth pivot is determined by scanning the current 
subcolumn A(k:n,k). In complete pivoting, the largest entry in the cur- 
rent submatrix A(k:n, k:n) is permuted into the (k, k) position. Thus, we 
compute the upper triangularization Mai Es 4 -+ ME, AR, -+ Ph- =U 
with the property that in step k we are confronted with the matrix 


ACH) = M, E, 7 M EAF; Fa 
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and determine interchange permutations E, and F, such that 


(BADA) |. 


(zaa) |= me , 


kk kien 


We have the analog of Theorem 3.4.1 


Theorem 3.4.2 If Gaussian elimination with complete pivoting is used to 
compute the upper triangularization 


My-1En-1°°- MiB AR, ee Ph- = U (3.4.7) 


then 
PAQ = LU 


where P= E41 - Ej, Q = Fi--- Fui and L is a unit lower triangular 
matriz with |£;| < 1. The kth column of L below the diagonal is a permuted 
version of the kth Gauss vector. In particular, if M, = I — ref then 
L(k + l:n, k) = g(k + Lin) where g = En a Epiri . 


Proof. The proof is similar to the proof of Theorem 3.4.1. Details are left 
to the reader. O 


Here is Gaussian elimination with complete pivoting in detail: 


Algorithm 3.4.2 (Gaussian Elimination with Complete Pivoting) 
This algorithm computes the complete pivoting factorization PAQ = LU 
where L is unit lower triangular and U is upper triangular. P = En~ -+ Ey 
and Q = F,---F,_1 are products of interchange permutations. A(1:k, k) 
is overwritten by U(1:k,k),k = l:n. A(k + 1:n, k) is overwritten by L(k + 
lin,k),k = lin —l. E, interchanges rows k and p(k). F, interchanges 


columns k and q(k). 


for k z 1:in-1 
Determine 4 with k < u < n and A with k < A < n so 
JAlu, A)} = max{ |A(i, j) :i = kin, j = kin} 
A(k, 1:n) = A(p, L:n) 
A(l:n, k) + A(l:n, A) 


Pk) = y 
q(k) =A 
if A(k, k) #0 


rows — k lm 
A(rows, k) = A(rows, k)/ A(E, k) 
A(rows,rows) = A(rows, rows) — A(rows, k) A(k, rows) 
end 
end 
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This algorithm requires 2n?/3 fopa and O(n?) comparisons. Unlike partial 
pivoting, complete pivoting involves a significant overhead because of the 
two-dimensional search at each stage. 


3.4.9 Comments on Complete Pivoting 


Suppose rank(A) = r < n. It follows that at the beginning of step r + 1, 
A(r--1:n, r - 1:5) = 0. This implies that Ey = F} = M, = I for k = r+ lin 
and so the algorithm can be terminated after step r with the following 
factorization in hand: 


_ _ | u a On Un 
Pag = Ly = | 7" "ul 1 3] 


Here Lj; and Uy, are r-by-r and La, and Uf are (n — r)-by-r. Thus, 
Gaussian elimination with complete pivoting can in principle be used to 
determine the rank of a matrix. Yet roundoff errors make the probability 
of encountering an exactly zero pivot remote. In practice one would have to 
"declare" A to have rank k if the pivot element in step k +1 was sufficiently 
small. The numerical rank determination problem is discussed in detail in 
$5.4. 

Wilkinson (1961) has shown that in exact arithmetic the elements of 
the matrix AU) = My Ep --- M, E, AF) -- - Fp satisfy 

jal] < gi2(2 39/2... ke maxa]. (3.4.8) 

The upper bound is a rather slow-growing function of k. This fact coupled 
with vast empirical evidence suggesting that p is always modestly sized (e.g, 
p = 10) permit us to conclude that Gaussian elimination with complete 
pivoting is stable. The method solves a nearby linear system (A + E)z = b 
exactly in the sense of (3.3.1). However, there appears to be no practical 
justification for choosing complete pivoting aver partial pivoting except in 
cases where rank determination is an issue. 


Example 3.4.6 If Gaussian elimination with complete pivoting is applied to the prob- 
lem 

001 100 zl = 1.00 

100 2.00 za 3.00 
with 8 = 10, t = 3, floating arithmetic, then 

01 _fo. 100 0.00 > [20 L0 

p=[} ah e-[i ah L= [0 100 |’ 9 = [om 490 
and 2 = [1.00, 1.00]7. Compare with Examples 3.3.1 and 3.4.3. 


3.4.10 The Avoidance of Pivoting 


For certain classes of matrices it is not necessary to pivot. It is important 
to identify such classes because pivoting usually degrades performance. To 
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illustrate the kind of analysis required to prove that pivoting can be safely 
avoided, we consider the case of diagonally dominant matrices. We say that 
A € RR" is strictly diagonally dominant if 


n 
leal > Slag! i=ln. 
j=l 
jhi 


The following theorem shows how this property can ensure a nice, no- 
pivoting LU factorization. 


Theorem 3.4.3 If AT is strictly diagonally dominant, then A has an LU 
factorization and |l;j| € 1. In other words, if Algorithm 3.4.1 is applied, 
then P=]. 
Proof. Partition A as follows 
a wT 

4-[ 6] 
where a is 1-by-1 and note that after one step of the outer product LU 
process we have the factorization 


a wT _ 1 0 1 0 a wT 
v C| vja I 0 C-vwT/o 0 Ij’ 
The theorem follows by induction on n if we can show that the transpose 


of B = C - vw" /a is strictly diagonally dominant. This is because we may 
then assume that B has an LU factorization B = L1U, and that implies 


1 0 a wT) _ 
A= | iy alle "EE 


But the proof that BT is strictly diagonally dominant is straight forward. 
From the definitions we have 


n=l n-1 jui n-eil 
SY leg — uwal € S legl + SE Y jesl 
i=l im] lel i 

i=l 
wj wj ixj i$5j 


S (el - loj) + Hiliol - iu) 


— 

= 

EM 
di 


A 
Ed 
i 
a 
= 
ü 
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3.411 Some Applications 


We conclude with some examples that illustrate how to think in terms of 
matrix factorizations when confronted with various linear equation situa- 
tions. 

Suppose A is nonsingular and n-by-n and that B is n-by-p. Consider the 
problem of finding X (n-by-p) so AX = B, i.e., the multiple right hand side 
problem. If X = [z1,...,z,] and B = [b1,..., 55 ] are column partitions, 
then 


Compute PA = LU. 


for k = l:p 
Solve Ly = Pb, (3.4.9) 
Solve Urk = y 

end 


Note that A is factored just once. If B = J, then we emerge with a 
computed A^! . 

As another example of getting the LU factorisation “outside the loop,” 
suppose we want to solve the linear system A*r = b where A € IR?*", 
b € R^, and k is a positive integer. One approach is to compute C = A* 
and then solve Cz = b. However, the matrix multiplications can be avoided 
altogetber: 


Compute PA = LU 

for j = 1:k 
Overwrite b with the solution to Ly = Pb. (3.4.10) 
Overwrite b with the solution to Ux = b. 

end 


As a final example we show how to avoid the pitfall of explicit inverse 
computation. Suppose we are given A c R™*“", d € IR", and ce R” and 
that we want to compute s = cT A^ !d. One approach is to compute X = 
A^! as suggested above and then compute s = cT Xd. A more economical 
procedure is to compute PA = LU and then solve the triangular systems 
Ly = Pd and Uz = y. It follows that s = cz. The point of this example is 
to stress that when a matrix inverse is encountered in a formula, we must 
think in terms of solving equations ratber than in terms of explicit inverse 
formation. 


Problems 


P3.4.1 Let A = LU be the LU factorization of n-by-n A with |;;| € 1. Let aT and uf 
denote the ith rows of A and U, respectively. Verify tbe equation 


ti 
ap = -Y tu 


= 
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and use it to show that {| U [loo < 2" 1 If A [foo - (Hint: Take norms and use induction.) 


P3.4.2 Show that if PAQ = LU is obtained via Gaussian elimination with complete 
pivoting, then no element of U (i, i:n) is larger in absolute value than [ixi]. 

P3.4.8 Suppose A € R**" has an LU factorization and that L and U are known. Give 
an algorithm which can compute the (i, 7) entry of A7! in approximately (n—j)? 4-(n—i)? 
flope. 

P3.4.4 Suppose X is the computed inverse obtained via (3.4.9). Give an upper bound 
for AÑ ~I lp- 

P3.4.5 Prove Theorem 3.4.2. 

P3.4.6 Extend Algorithm 3.4.3 so that it can factor an arbitrary rectangular matrix. 
P3.4.7 Write a detailed version of the block elimination algorithm outlined in $3.4.7. 
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Just as there are six “conventional” versions of scalar Gaussian elimination, there are 
also six conventional block formulations of Gaussian elimination. For a discussion of 
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3.5 Improving and Estimating Accuracy 


Suppose Gaussian elimination with partial pivoting is used to solve the n- 
by-n system Az = b. Assume i-digit, base 6 floating point arithmetic is 
used. Equation (3.4.6) essentially says that if the growth factor is modest 
then the computed solution ¢ satisfies 


(4-E)jzb  [ElsmulAle u= jr (8.5.1) 


In this section we explore the practical ramifications of this result. We begin 
by stressing the distinction that should be made between residual size and 
accuracy. This is followed by a discussion of scaling, iterative improvement, 
and condition estimation. See Higham (1996) for a more detailed treatment 
of these topics. 

We make two notational remarks at the outset. The infinity norm is used 
throughout since it is very handy in roundoff error analysis and in practical 
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error estimation. Second, whenever we refer to “Gaussian elimination” in 
this section we really mean Gaussian elimination with some stabilizing pivot 
strategy such as partial pivoting. 


3.5.1 Residual Size Versus Accuracy 


The residual of a computed solution $ to the linear system Az = b is the 

vector b — Aĉ. A small residual means that AZ effectively “predicts” the 

right hand side b. From (3.5.1) we have || b— Aĉ læ = ull A {looll £lloo 
and so we obtain 

Heuristic I. Gaussian elimination produces a solution $ with a relatively 
small residual. 

Small residuals do not imply high accuracy. Coinbining (3.3.2) and (3.5.1), 

we see that T i 

£ —z|lo 
——— —— = Wald). 3.5.2 
Tzi (a) 65.2) 

This justifies a second guiding principle. 

Heuristic II. If the unit roundoff and condition satisfy u ~ 1074 and 
Koo(A) = 10%, then Gaussian elimination produces a solution $ that 
has about d — q correct decimal digits. 

If u&,,( A) is large, then we say that A is ill-conditioned with respect to 

the machine precision. 

As an illustration of the Heuristics I and II, consider the systein 


988 579] fz] _ | 235 

409 .237 za | | 107 
in which xj, (4) = 700 and x = (2, —3)T. Here is what we find for various 
machine precisions: 


L-z] | Hb- Alo 
lz Heo TA Teall ko 


Whether or not one is content with the computed solution $ depende ou 
the requirements of the underlying source problem. In many applications 
accuracy is not important but small residuals are. In such a situation, the 
produced by Gaussian elimination is probably adequate. On the other 
hand, if the number of correct digits in $ is an issue then the situation 
is more complicated and the discussion in the remainder of this section is 
relevant, 
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3.5.2 Scaling 

Let B be the machine base and define the diagonal matrices D, and D2 by 
Dj = diag"... A") 
Dy = diag(8...6%). 


The solution to the n-by-n linear system Az = 6 can be found by solving 
the scaled system (D,'AD2)y = D,'b using Gaussian elimination and 
then setting x = Doy. The scalings of A, b, and y require only O(n?) flops 
and may be accomplished without roundoff. Note that D, scales equations 
and D; scales unknowns. 

It follows from Heuristic II that if £ and jj are the com puted versions of 
z and y, then 


]227 6 -z)]e _ 9-7 vl ., -14D ; 
| Dr's fæ [vle ^ 90) AD). — 653) 


Thus, if fa (DL! AD32) can be made considerably smaller than &,, (4), then 
we might expect a correspondingly more accurate $, provided errors are 
measured in the “D2” norm defined by | zllp, = || Dz!z |... This is the 
objective of scaling. Note that it encompasses two issues: the condition 
of the scaled problem and the appropriateness of appraising error in the 
D -norm. 

An interesting but very difficult mathematical problem concerns the 
exact minimization of x,(D;'ADz2) for general diagonal D; and various 
p. What results there are in this direction are not very practical. This is 
hardly discouraging, however, when we recall that (3.5.3) is heuristic and 
it makes little sense to minimize exactly a heuristic bound. What we seek 
is a fast, approximate method for improving the quality of the computed 
solution 2. 

One technique of this variety is simple row scaling. In this scheme D» is 
the idantity and D; is chosen so that each row in D; ! A has approximately 
the same oo-norm. Row scaling reduces the likelihood of adding a very 
small number to a very large number during elimination—-an event that 
can greatly diminish accuracy. 

Slightly more complicated than simple row scaling is row-column equi- 
libration. Here, the object is to choose D; and D2 so that the oo-norm 
of each row and column of D; ! AD belongs to the interval (1/9, 1] where 
B is the base of the floating point system. For work along these lines see 
McKeeman (1962). 

It cannot be stressed too inuch that simple row scaling and row-column 
equilibration do not "solve" the scaling problem. Indeed, either technique 
can render a worse ĉ than if no scaling whatever is used. The ramifications 
of this point are thoroughly discussed in Forsythe and Moler (1967, chap- 
ter 11). The basic recommandation is that the scaling of equations and 
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unknowns must proceed on a problem-by-problem basis. General scaling 
strategies are unreliable. It is best. to scale (if at all) on the basis of what the 
source problem proclaims about the significance of each a;;. Measurement 
units and data error may have to be considered. 


Example 3.5.1 (Forsythe and Moler (1987, pp. 34, 40}) . If 
10 100,000 i - 100,000 
1 1 x2 7 2 
and the equivalent row-scaled problem 
.0001 1 x1 _ 1 
1 1 xa E 2 
are each solved using A = 10,£ = 3 arithmetic, then solutions 2 — (0.00, 1.00)7 and 


2 = (1.00, 1.00)7 are respectively computed. Note that z = (1.0001..., .9999.. )7 is 
the exact solution. 


3.5.3 Iterative Improvement 


Suppose Az = b has been solved via the partial pivoting factorization PA = 
LU and that we wish to improve the accuracy of the computed solution żŻ. 
If we execute 


r=b- At 
Solve Ly = Pr. (3.5.4) 
Solve Uz = y. 


Inew =E+Z 


then in exact arithmetic Atnew = Af + Az = (b—r) +r = b. Unfortunately, 
the naive floating point execution of these formulae renders an Tnew that is 
no more accurate than 2. This is to be expected since f = fi(b — Az) has 
few, if any, correct significant digits. (Recall Heuristic I.) Consequently, 
2- fl(A7v) = A71. noise zs noise is a very poor correction from the 
standpoint of improving the accuracy of . However, Skeel (1980) has done 
an error analysis that indicates when (3.5.4) gives an improved thew from 
the standpoint of backwards error. In particular, if the quantity 


T = (ALAM lee) (max (Alles mim. (Ale) 


is not too big, then (3.5.4) produces an Znew such that (A + E)inew = b 
for very small E. Of course, if Gaussian elimination with partial pivoting 
is used then the computed $ already solves a nearby system. However, 
this may not be the case for some of the pivot strategies that are used to 
preserve sparsity. In this situation, the fized precision iterative improvement 
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step (3.5.4) can be very worthwhile and cheap. See Arioli, Demmel, and 
Duff (1988). 

For (3.5.4) to produce a more accurate z, it is necessary to compute the 
residual b— AZ with extended precision floating point arithmetic. Typically, 
this means that if t-digit arithmetic is used to compute PA = LU, x, y, and 
z, then 2t-digit arithmetic is used to form b— Az, i.e., double precision. The 
process can be iterated. In particular, once we have computed PA = LU 
and initialize r = 0, we repeat the following: 


r =b — Az (Double Precision) 


Solve Ly = Pr for y. (3.5.5) 
Solve Uz = y for z. 
r-rtz 


We refer to this process as mired precision iterative improvement. The 
original A must be used in the double precision computation of r. The 
basic result concerning the performance of (3.5.5) is summarized in the 
following heuristic: 


Heuristic III. If the machine precision u and condition satisfy u = 1074 
and &,(A) 2: 10%, then after k executions of (3.5.5), z has approxi- 
mately min(d, k(d — q)) correct digits. 


Roughly speaking, if ux,.(A) < 1, then iterative improvement can ulti- 
mately produce a solution that is correct to full (single) precision. Note 
that the process is relatively cheap, Each improvement costs O(n”), to be 
compared with the original O(n?) investment in the factorisation PA = LU. 
Ofcourse, no improvement may result if Ais badly enough conditioned with 
respect to the machine precision. 

The primary drawback of mixed precision iterative improvement is that 
its implementation is somewhat machine-dependent. This discourages its 
use in software that is intended for wide distribution. The need for retaining 
an original copy of A is another aggravation associated with the method. 

On the other hand, mixed precision iterative improvement is usually 
very easy to implement on a given machine that has provision for the ac- 
cumulation of inner products, ie., provision for the double precision calcu- 
lation of inner products between the rows of A and z. In a short mantissa 
computing environment the presence of an iterative improvement routine 
can significantly widen the class of solvable Ar = 6 problems. 


Example 3.5.2 1f (3.5.5) is applied to the system 
966 579 zi 235 
409 .23T z2 -107 
and 8 = 10 and t = 3, then iterative improvement produces the following sequence of 
computed solutions: 
£z 2411 1.99 2.00 
-3.17 ]' —2.99 J' | -300 J'"" 
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The exact solution is z = [2, —3]7. 


3.5.4 Condition Estimation 


Suppose that we have solved Ar = b via PA = LU and that we now wish 
to ascertain the number of correct digits in the computed solution 2. It 
follows from Heuristic II that in order to do this we need an estimate of the 
condition &o;(À) = || A llæll A^! foo. Computing || A [oo poses no problem 
as we merely use the formula 


| Allo = max Y hl. 


1sign 3 


The challenge is with respect to the factor || A^! |loo. Conceivably, we 
could estimate this quantity by || X loo, where X = [£,,...,24] and &; 
is the computed solution to Az, = e;. (See §3.4.9.) The trouble with this 
approach is its expense: Koo = || A lloll X Ilo costs about three times as 
much as £. 

The central problem of condition estimation is how to estimate the 
condition number in O(n?) flops assuming the availability of PA = LU or 
some other factorisations that are presented in subsequent chapters. An 
approach described ín Forsythe and Moler (SLE, p. $1) is based on iterative 
improvement and the heuristic uk; (A) =% |t z lloo/|| x [oo where z is the first 
correction of z in (3.5.5). While the resulting condition estimator is O(n”), 
it suffers from the shortcoming of iterative improvement, namely, machine 
dependency. 

Cline, Moler, Stewart, and Wilkinson (1979) have proposed a very suc- 
cessful approach to the condition estimation problem without this flaw. It 
is based on exploitation of the implication 


Ay=d => | AT les 2 Iy llo/ dtl 


‘The idea behind their estimator is to choose d so that the solution y is large 
in norm and then set 


Ro. = | A lloll ¥ l/l d llo- 


The success of this method hinges on how close the ratio || y |loo/|l d [loo is 
to its maximum value || A^! |[oo- 

Consider the case when A = T is upper triangular. The relation between 
d and y is completely specified by the following column version of back 
substitution: 


p(l:n) = 
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for k =n: - 1:1 
Choose d(k). 
y(k) = (d(k) — p(k))/T(k, k) (3.5.6) 
p(L:k — 1) = p(1:k — 1) + y(k)T(1:k — 1, k) 

end 


Normally, we use this algorithm to solve a given triangular system Ty = d. 
Now, however, we are free to pick the right-hand side d subject to the 
"constraint" that y is large relative to d. 

One way to encourage growth in y is to choose d(k) from the set 
{-1,+1} so as to maximise y(k). If p(k) > 0, then set d(k) = —1. If 
p(k) « 0, then set d(k) = +1 . In other words, (3.5.6) is invoked with d(k) 

= -sign(p(k)). Since d is then a vector of the form d(1:n) = (+1,..., +1)”, 
we obtain the estimator Koo = || T lloll y Ilco- 

A more reliable estimator results if d(k) € (—1,--1) is chosen so as 
to encourage growth both in y(k) and the updated running sum given by 
p(l:k — 1, k) + T(1:k — 1, k)y(k). In particular, at step k we compute 


y(k)* = (1 ~ p(k))/T(k, k) 
s(k)* = [y(5)*] + I p(k — 1)  T(Ek - 1, k)y(k)* ll 
y(K)7 = (-1 — p(k))/T(k, k) 


s(K)" = ly(E)7] + Il pQ:k - 1) + T(1:k - 1, k)y(k)™ Ih 


and set 
y(k)* ifs(k)* 2 3(k)~ 
y(k) = 


y(k)~ i£s(K)* < s(k)~ 
This gives 


Algorithm 3.5.1 (Condltion Estimator) Let T € R"*" be a nonsin- 
gular upper triangular matrix. This algorithm computes unit oo-norm y 
and a scalar « so || Ty lloo Œ 1/]| T^! loo and & 2s &;c (T) 


p(Lln)- 
for k =n: — 1:1 
v()* = (1 - p(9)/T(, k) 
y(k)~ = (-1— p{k))/T(k, k) 
p(k)* = p(l:k - 1) + T(1:k — 1, k)g(k)* 
p(k) = p(k — 1) + T(1:k — 1, k)y(k) 
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if [y(k)*] + Il 2k) * li 2 (571 E07 th 
y(k) = y(k)* 
p(1:k — 1) = p(k)* 


y(k) = y(k)7 
p(1:k — 1) = p(k)7 


else 


end 


en 
c= Ve T s 
y = yl y llo 


The algorithm involves several times the work of ordinary back substitution. 

We are now in a position to describe a procedure for estimating the 
condition of a square nonsingular matrix A whose PA = LU factorization 
we know: 


e Apply the lower triangular version of Algorithm 3.5.1 to UT and ob- 
tain a large norm solution to UTy = d. 


ə Solve the triangular systems LTr = y, Lw = Pr, and Uz = w 


* Kos = HA Moll z lleo/Il n loo- 
Note that || z loo < || A^* lloll” llo- The method is based on several heuris- 


tics. First, if A is ill-conditioned and PA = LU, then it is usually the case 
that U is correspondingly ill-conditioned. The lower triangle L tends to be 
fairly well-conditioned. Thus, it is more profitable to apply the condition 
estimator to U than to L. The vector r, because it solves AT PTr = d, 
tends to be rich in the direction of the left singular vector associated with 
omin(A). Righthand sides with this property render large solutions to the 
problem Az =r. 

In practice, it is found that the condition estimation technique that we 
have outlined produces good order-of-magnitude estimates of the actual 
condition number. 


Problems 


P3.5.1 Show by example that there may be more than one way to equilibrate a matrix. 


P3.5.2 Using f = 10,¢ = 2 arithmetic, solve 


no5][an].r[7 

5 7 Z4 ~ 3 
using Gaussian elimination with partial pivoting. Do one step of iterative improvement 
using t = 4 arithmetic to compute the residual. (Do not forget to round the computed 
residual to two digits.) 
P3.5.3 Suppose P(A4E)- LU, where P is a permutation, L is lower triangular with 
l&;| € 1, and Ü is upper triangular. Show that &co(A) 2 || A lloo/(Il E llo + p) where 
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p = min |ü;;|. Conclude that if a small pivot is encountered when Gaussian elimination 
with pivoting is applied to A, then A is ill-conditionad. The converse is not true. (Let 
A= By). 

P3.5.4 (Kahan 1966) The system Ax = b where 


2 -1 1 2(1 4 10719) 
A= | -1 1079 19710 b= —10719 
1 10719 19-19 197!9 


has solution z = (107 — 1 1)7. (a) Show that if (A + E)y = b and |E] X 107%]A}, 
then |z — y| < 1077|z|. That is, small relative changes in A's entries do not induce large 
changes in z even tbough xoo(A) = 10!9. (b) Defiue D = diag(1075, 105, 10°). Show 
Koo( DAD) € 5. (c) Explain what is going on io terms of Thsorem 2.7.3. 

P3.5.5 Consider the matrix: 


1 0 M -M 
0 1 -M M 

T= 0 0 1 0 MER. 
0 0 0 1 


What estimate of xc, (T) is produced when (3.5.6) is applied with d(k) = —sign(p(k))? 
What estimate does Algorithm 3.5.1 produce? What is the true & (T)? 


P3.5.6 What does Algorithm 3.5.1 produce when applied to the matrix B, given in 
(2.7.9)? 
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Chapter 4 


Special Linear Systems 


841 The LDMT and LDLT Factorizations 
84.2 Positive Definite Systems 

84.3 Banded Systems 

§4.4 Symmetric Indefinite Systems 

$4.5 Block Systems 

84.6 Vandermonde Systems and the FFT 
84.7 Toeplitz and Related Systems 


It is a basic tenet of numerical analysis that structure should be ex- 
ploited whenever solving a problem. In numerical linear algebra, this trans- 
lates into an expectation that algorithms for general matrix problems can 
be streamlined in the presence of such properties as symmetry, definiteness, 
and sparsity. This is the central theme of the current chapter, where our 
principal aim is to devise special algorithms for computing special variants 
of the LU factorization. 

We begin by pointing out the connection between the triangular fac- 
tors L and U when A is symmetric. This is achieved by examining the 
LDMT factorization in $4.1. We then turn our attention to the important 
case when A is both symmetric and positive definite, deriving the stable 
Cholesky factorization in $4.2. Unsymmetric positive definite systems are 
also investigated in this section. In $4.3, banded versions of Gaussian elimi- 
nation and other factorization methods are discussed. We then examine the 
interesting situation when A is symmetric but indefinite. Our treatment of 
this problem in §4.4 highlights the numerical analyst’s ambivalence towards 
pivoting. We love pivoting for the stability it induces but despise it for the 
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structure that it can destroy. Fortunately, there is à happy resolution to 
this conflict in the symmetric indefinite problem. 

Any block banded matrix is also banded and so the methods of $4.3 are 
applicable, Yet, there are occasions when it pays not to adopt this point of 
view. To illustrate this we consider the important case of block tridiagonal 
systema in §4.5. Other block systems are discussed as well. 

In the final two sections we examine some very interesting O(n?) algo- 
rithms that can be used to solve Vandermonde and Toeplitz systems. 


Before You Begin 


Chapter 1, §§2.1-2.5, and §2.7, and Chapter 3 are assumed. Within this 
chapter there are the following dependencies: 


§4.5 
f 
$41 — $42 — $43 - $44 
l 
$46 — $847 


Complementary references include George and Liu (1981), Gill, Murray, 
and Wright (1991), Higham (1996), Trefethen and Bau (1996), and Demmel 
(1996). Some MATLAB functions important to this chapter: chol, tril, 
triu, vander, toeplitz, fft. LAPACK connections include 


LAPACK: General Band Matrices 


Solve AX = B 

Condition estimator 

Improve AX = B, AT X = B, A" X = B solutions with error bounds 
Solve AX = B, AT X = B, AU X = B with conditinn estimate 
PA=LU 

Solve AX = B, AT X = B, AE X = B vis PA = LU 

Equilibration 


olve A 

Condition es estimator 

Improve AX = B, AT X = B, AU X = B solutions with error bounds 
Solve AX = B, AT X = B, AU X = B with condition estimate 
PAzxLU 

Solve AX = B, AT X = B, AH X = B via PA = LU 


Condition estimate via PA = LU 
Improve AX = B solutions with error bounds 
Solve AX = B with condition estimate 
AzGGT 

Solve AX = B via A= GGT 
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LAPACK: Banded Symmetric Positive Definite 
PBSV Solve AX = B 
Condition estimate via A = GGT 
Improve AX = B solutions with error bounds 
Solve AX z B with condition estimate 
AzGGT 
Solve AX = B via A= GGT 


Salve A 
Condition e estimate via A= LDLT 
Improve AX = B solutions with error bounds 


Solve AX = B with condition estimate 
A= LDLT 
Soive AX = B via A= LDLT 


Condition e estimate via PAPT = LDLT 
Improve AX = B solutions with error bounds 
Solve AX = B with condition estimate 
PAPT = LDLT 

Solve AX = B via PAP? = LDLT 

A- 


LAPACK: Triangular Banded Matrices 


~TBCOM | Condition estimate 
Improve AX = B, AT X = B solutions with error bounds 
Solve AX = B, ATX =B 


4.1 The LDM! and LDL? Factorizations 


We want to develop a structure-exploiting method for solving symmetric 
Az = b problems. To do this we establish a variant of the LU factorization 
in which A is factored into a three-matrix product LDMT where D is 
diagonal and L and M are unit lower triangular. Once this factorization is 
obtained, the solution to Ar = 6 may be found in O(n?) flops by solving 
Ly = b (forward elimination), Dz = y, and MTz = z (back substitution). 
The reason for developing the LDMT factorization is to set the stage for 
the symmetric case for if A = AT then L = M and the work associated 
with the factorization is half of that required by Gaussian elimination. The 
issue of pivoting is taken up in subsequent sections. 


4.41 The LDM! Factorization 
Our first result connects the LDMT factorization with the LU factorization. 
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Theorem 4.1.1 Jf ali the leading principal submatrices of A € R"*" are 
nonsingular, then there exist unique unit lower triangular matrices L and M 
and a unique diagonal matriz D = diag(d1,...,dy) such that A= LDMT. 


Proof. By Theorem 3.2.1 we know that A has an LU factorization A = LU. 
Set D = diag(d,,...,d,) with dj = ti; for i = lin. Notice that D is non- 
singular and that MT = D-!U is unit upper triangular. Thus, A = LU = 
LD(D-!U) = LDMT. Uniqueness follows from the uniqueness of the LU 
factorization as described in Theorem 3.2.1. O 


The proof shows that the LDMT factorization can be found by using Gaus- 
sian elimination to compute A = LU and then determining D and M from 
the equation U = DMT. However, an interesting alternative algorithm can 
be derived by computing L, D, and M directly. 

Assume that we know the first j — 1 columns of L, diagonal entries 
di,...,dj-1 of D, and the first j — 1 rows of M for some j with 1 <j € n. 
To develop recipes for L(j + 1:n,j), M(j, 1:j — 1), and dj we equate jth 
columns in the equation A = LDMT. In particular, ` 


Allin, j) = Lo (4.1.1) 


where v = DMT e,;, The “top” half of (4.1.1) defines v(1:7) as the solution 
of a known lower triangular system: 


L(1:3,1:5)»(1:3) = A(:5,3). 
Once we know v then we compute 
4G) = wG) 
MG.) vidi)  i-1j-1 


The “bottom” half of (4.1.1) says L(j + Ln, 1:)v(1:j) = A(j *-1:n, j) which 
can be rearranged to obtain a recipe for the jth column of L: 


Lj + iin, j)o(7) = AG + bn, j) - LG + Lan, Lj -1»(1:5 — 1). 


Thus, L(j + l:n, j) is a scaled gaxpy operation and overall we obtain 


for j = l:n 
Solve 1(1:5, 1:5)v(1:7) = A(1:j, j) for v(1:7). 
fori-21j-1 
M(j,i) = v(i)/d(i) (4.1.2) 
end 


d(j) = v(3) 
L(j + Ln, j) = 
(AQ + Lin, j) — LG + En, 1:j - 1)o(1:7 — 1) /v(7) 
end 
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As with the LU factorization, it is possible to overwrite A with the L, D, 
and M factors. If the column version of forward elimination is used to solve 
for v(1:7) then we obtain the following procedure: 


Algorithm 4.1.1 (LDM™) If A € R'"" has an LU factorization then 
this algorithm computes unit lower triangular matrices L and M and a 
diagonal matrix D = diag(di,...,d,) such that A = LDMT, The entry 
ay is overwritten with 4; if i >j , with d; if i = j, and with mg if i < j. 


for j = Ln 
{ Solve (1:5, 1:7) v(1:7) = A(1:3, 3). } 
v(15) = A(1:,j) 
fork-lj-1 
v(k + 1:7) = v(k + 1:3) — v(kK) A(k + 1:j, k) 


end 
{ Compute M(j,1:j — 1) and store in A(1:j — 1,5). } 
for i= 1:7 -1 


AG.3) = v(D/A( i) 
end 
{ Store d(j) in A(j, 7). } 
AG, 3) = v(i) 
{ Compute L(j + 1:n, J) and store in A(j + 1:5, j) ) 
for k = 1:7 -1 
A(j + Lin) = A(j + Lin, j) — v(K)A(j + Lin, k) 
end 
AG + Ln, j) = AG + En, j)/v(3) 
end 


This algorithm involves the same amount of work as the LU factorization, 
about 2n3/3 flops. - 

The computed solution $ to Ar = b obtained via Algorithm 4.1.1 and 
the usual triangular system solvers of $3.1 can be shown to satisfy a per- 
turbed system (A + E) = b, where 


|E] < nu (3141 + SIÉILDIIT]) +0(u?) (4.1.3) 


and £, D, and M are the computed versions of L, D, and M, respectively. 

As in the case of the LU factorization considered in the previous chapter, 
the upper bound in (4.1.3) is without limit unless some form of pivoting is 
done. Hence, for Algorithm 4.1.1 to be a practical procedure, it must be 
modified so as to compute a factorization of the form PA = LDMT, where 
P is a permutation matrix chosen so that the entries in L satisfy |¢;;| € 1. 
The details of this are not pursued here since they are straightforward and 
since our main object for introducing the LDMT factorisation is to motivate 


138 CHAPTER 4. SPECIAL LINEAR SYSTEMS 


special methods for symmetric systems. 
Example 4.1.1 
10 10 20 100 10 0 0 110 
Az 20 25 40 = 210 0 50 a 1 
30 50 61 341 001 0 0 


and upon completion, Algorithm 4.1.1 overwrites A as follows: 


101 2 
A= 25 07. 
3.41 


4.1.2 Symmetry and the LDL? Factorization 
There is redundancy in the LDMT factorization if A is symmetric. 


Theorem 4.1.2 If A = LDMT is the LDMT factorization of a nonsin- 
gular symmetric matriz A, then L = M. 


2 
0 
1 


Proof. The matrix M^!AM-T = M^!LD is both symmetric and lower 
triangular and therefore diagonal. Since D is nonsingular, thís implies 
that M7?Z is also diagonal. But M~'Z is unit lower triangular and so 
M^LsS1IO 


In view of this result, it is possible to halve the work in Algorithm 4.1.1 
when it is applied to a symmetric matrix. In the jth step we already know 
M(j, 1:j — 1) since M = L and we presume knowledge of L's first j — 1 
columns. Recall that in the jth step of (4.1.2) the vector v(1:) is defined 
by the first j components of DMTe;. Since M = L, this says that 


d(l) L, 1) 
v(1) = : 2l 
d(j - ))L(.j - 1) 

d(j) 
Hence, the vector v(1:j — 1) can be obtained by a simple scaling of L’s jth 
row. The formula v(j) = A4(j,j) — L(j, 1:j — 1)v(1:j — 1) can be derived 
from the jth equation in (1:7, 1:7)v = A(1:7, j) rendering 
for j = lin 
for j= 1j-1 
v(i) = LG, i)d(i) 


LG +1n,j) = 
(A( + Lin, j) — LG + Lin, 1:5 — 1)v(1: — 1)/v() 
end 


4.1. THE LDMT AND LDLT FACTORIZATIONS 139 
With overwriting this becomes 


Algorithm 4.1.2 (LDLT) If A € R"*? is symmetric and has an LU 
factorization then this algorithm computes a unit lower triangular matrix 
L and a diagonal matrix D = diag(di,...,da) so A = LDLT. The entry 
aj is overwritten with 2; i£ i > j and with d; if i = j. 


for j = l:n 
{ Compute v(1:7). ) 
for i= 1:7 -1 


v(i) = A(j, i) A(i, 3) 
end 
vlj) = AG, j) - AG, Lj - Dv (1j - 1) 
{ Store d(j) and compute L(j + L:n, j). } 
Al, j) = v(7) 
AQ + Lin, 7) = 
(AG + lin, j) - AQ + Lin, Lj — 1)u(1:7 — 1))/oG) 
end 
This algorithm requires n*/3 flops, about half the number of flops involved 
in Gaussian elimination. 

In the next section, we show that if A is both symmetric and positive 
definite, then Algorithm 4.1.2 not only runs to completion, but is extremely 
stable. If A is symmetric but not positive definite, then pivotiog may be 
necessary and the methods of §4.4 are relevant. 


Example 4.2.2 
10 20 230 100 10 0 0 123 
A={ 2 45 8O/ =; 2 10 050 0 14 
30 80 171 3.41 oat 00 1 


and so if Algorithm 4.1.2 is applied, A is overwritten by 


10 20 30 
A= 2 5 8f. 


3 4 1 


Problems 


P4.1.1 Show that the LDMT factorization of a nonsingular A is unique if it exists. 
P4.1.2 Modify Algorithm 4.1.1 so that it computes a factorization of the form PA = 
LDMT, where L and M are both unit lower triangular, D is diagonal, and P is a 
permutation that is chosen so |é;;| < 1. 

P4.1.3 Suppose the n-by-n symmetric matrix A = (as) is stored in a vector c as 
follows: c = (a11,021,..-,d«1, 022, --. Gn2;..., Gan). Rewrite Algorithm 4.1.2 with A 
stored in this fashion, Get as much indexing outside the inner loops as possible. 
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P4.1.4 Rewrite Algorithm 4.1.2 for A stored by diagonal. See $1.28. 


Notes and References for Sec. 4.1 


Algorithm 4.1.1 is related to the methods of Crout and Doolittle in that outer product 
Updates are avoided. See Chapter 4 of Fax (1984) or Stewart (1973,131-149). An Algol 
procedure may he found in 


H.J. Bowdler, R.S. Martin, G. Peters, and J.H. Wilkinson (1966), "Solution of Real and 
Complex Systems of Linear Equations" Numer. Math. 8, 217-234. 


See also 


G.E. Forsythe (1960). *Crout with Plvoting," Comm. ACM 3, 501-08. 
W.M, MeKeeman (1962), "Crout with Equilibration and Iteration,” Comm. ACM 5, 
553-55. 


Just as algorithms can he tailored to expioit structure, so can error analysis and pertur- 
bation theory: 


M. Arioli, J, Demmel, and I. Duff (1989). "Solving Sparse Linear Systems with Sparse 
Backward Error," SIAM J. Matriz Anal. Appl. 10, 165-190. 

J.R. Bunch, J.W. Demmel, and C.F. Van Loan (1989). “The Strong Stability of Algo- 
rithms for Solving Symmetric Linear Systems," SIAM J. Matriz Anal. Appl. 10, 
494—499. 

A. Bartlund (1991). "Pertnrbetion Bounds for the LDLT and LU Decompositions,” 
BIT 31, 358-363. 

D.J. Higbam and N.J, Higham (1992). “Backward Error and Condition of Structured 
Linear Systems," SIAM J, Matrix Anal. Appi. 13, 162-175. 


4.2 Positive Definite Systems 


A matrix A € R"™" is positive definite if zT Az > 0 for all nonzero x € R”. 
Positive definite systems constitute one of the most important classes of 
special Ar = b problems. Consider the 2-by-2 symmetric case. If 


A= gu 812 
G21 an 


is positive definite then 


r = (L0T => zTAz = ay>0 

c= (0, 1r => lAr = ag 0 

r = (1, 1)7 > rAr = a + 2ay2 033 > 0 
r = (1-1 > zTAr = ay -2an +an> 0. 


The last two equations imply |a12] € (a11 + @22)/2. From these results we 
see that the largest entry in A is on the diagonal and that it is positive. This 
turns out to be true in general. A symmetric positive definite matrix has 
a “weighty” diagonal. The mass on the diagonal is not blatantly obvious 
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as in the case of diagonal dominance but it has the same effect in that it 
precludes the need for pivoting. See $3.4.10. 

We begin with a few comments about the property of positive definite- 
ness and what it implies in the unsymmetric case with respect to pivoting. 
We then focus on the efficient organization of the Cholesky procedure which 
can be used to safely factor a symmetric positive definite A. Gaxpy, outer 
product, and block versions are developed. The section concludes with a 
few comments about the semidefinite case. 


4.3.1 Positive Definiteness 


Suppose A € IR"*" is positive definite. It is obvious that a positive definite 
matrix is nonsingular for otherwise we could find a nonzero z so zT Az = 0. 
However, much more is implied by the positivity of the quadratic form 
zT Ar as the following results show. 


Theorem 4.2.1 Jf A € R'*" is positive definite and X € R"** has rank 
k, then B = XT AX € R*** is also positive definite. 


Proof. If z € IR* satisfies 0 > zT Bz = (Xz)! A(Xz) then Xz = 0. But 
since X has full column rank, this implies that z = 0. O 


Corollary 4.2.2 If A is positive definite then all its principal submatrices 
are positive definite. In particular, all the diagonal entries are positive. 


Proof. If v € R* is an integer vector with 1 € vy < -+ < vy € n, then 
X = I,(:,v) is a rank k matrix made up columns v;,. .. vy of the identity. 
It follows from Theorem 4.2.1 that A(v,v) = XT AX is positive definite. O 


Corollary 4.2.3 If A is positive definite then the factorization A = LDMT 
exists and D = diag(d;,...,d,) has positive diagonal entries. 


Proof. From Corollary 4.2.2, it follows that the submatrices A(1:k, 1:k) 
are nonsingular for k = l:n and so from Theorem 4.1.1 the factorization 
A= LDM" exists. If we apply Theorem 4.2.1 with X = L-T then B = 
DMTL-T = L-!AL-T is positive definite, Since MT L-T is unit upper 
triangular, B and D have the same diagonal and it must be positive. O 
There are several typical situations that give rise to positive definite ma- 
trices in practice: 

* The quadratic form is an energy function whose positivity is guaran- 

teed from physical principles. 


* The matrix A equals a cross-product XT X where X has full column 
rank. (Positive definiteness follows by setting A = J, in Theorem 
421.) 


ə Both A and AT are diagonally dominant and each aş is positive. 
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4.2.0  Unsymmetric Positive Definite Systems 


The mere existence of an LDMT factorization does not mean that its com- 
putation is advisable because the resulting factors may have unacceptably 
large elements. For example, if e > 0 then the matrix 


Lt] Las tl std v] 


is positive definite. But if m/e > 1, then pivoting is recommended. 
The following result suggests when to expect element growth in the 
LDMT factorisation of a positive definite matrix. 


Theorem 4.2.4 Let A € R°*" be positive definite and set T = (A+A™)/2 
and S = (A — AT)/2. If A= LDMT, then 


WZUDIMT He < n (Il TI EST^! SIS) (4.2.1) 
Proof. See Golub and Van Loan (1979). D 


The theorem suggests when it is safe not to pivot. Assume that the com- 
puted factors L, D, and M satisfy: 

WLWDIAT he < eff IPILPILMT] te, (4.2.2) 
where c is a constant of modest size. It follows from (4.2.1) and the analysis 
in §3.3 that if these factors are used to compute a solution to Az = 6, then 
the computed solution £ satisfies (A + E)Z = b with 

| Elle < u(3nl Alle + Sen? (IT lla + || ST-'S 12) + O(u?). (4.2.3) 


It is easy to show that | T ||; € || A |[2, and so it follows that if 


— [STS a 
a = SE (4.2.4) 


is not too large then it is safe not to pivot. In other words, the norm of the 
skew part S has to be modest relative to the condition of the symmetric 
part T. Sometimes it is possible to estimate (1 in an application, This is 
trivially the case when A is symmetric for then Q = 0. 


4.2.3 Symmetric Positive Definite Systems 


When we apply the above results to a symmetric positive definite system 
we know that the factorization A = LDLT exists and moreover is stable to 
compute. However, in this situation another factorization is available. 
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Theorem 4.2.5 (Cholesky Factorization ) If A € IRP*" is symmetric 
positive definite, then there exists a unique lower triangular G € FC *" with 
positive diagonal entries such that A « GGT. 


Proof. From Theorem 4.1.2, there exists a unit lower triangular L and a 
diagonal D = diag(d;,..., dn) such that A = LDLT. Since the d, are pos- 
itive, the matrix G = L diag( V/d;, ... , V'dn) is real lower triangular with 
positive diagonal entries. It also satisfies A = GGT. Uniqueness follows 
from the uniqueness of the LDLT factorization. O 


The factorization A = GGT is known as the Cholesky factorization and G 
is referred to as the Cholesky triangle. Note that if we compute the Cholesky 
factorization and solve the triangular systems Gy = b and GTz = y, then 
b = Gy = G(GTz) = (GGT)z = Az. 

Our proof of the Cholesky factorization in Theorem 4.2.5 is constructive. 
However, more effective methods for computing the Cholesky triangle can 
be derived by manipulating the equation A = GGT. This can be done in 
several ways as we show in the next few subsections. 


Example 4.2.1 The matrix 
[2 - E go «3 3] 4 4] 


is positive definite. 


4.2.4  Gaxpy Cholesky 


We first derive an implementation of Cholesky that is rich in the gaxpy 
operation. If we compare jth columns in the equation A = GGT then we 
obtain . 


H 
A(.3) = $5 GG. K)G(. k) . 


k=l 
This says that 
ja 
G(3,3)G(5j) = AGI) - $56 k)G(,k) = v. (4.2.5) 
kel 


If we know the first j — 1 columns of G, then v is computable. It follows 
by equating components in (4.2.5) that 


G(f:n, j) = v(:)//v(3). 


This is a scaled gaxpy operation and so we obtain the following gaxpy-based 
method for computing the Cholesky factorization: 
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for j = l:n 
v(j:n) = A(j:n,j) 
for k=1:j-1 


v(3:n) = v(j:n) — GG, k)G(j:n, k) 
end 
G(j:n, j) = v(:)/ V v) 
end 


It is possible to arrange the computations so that G overwrites the lower 
triangle of A. 


Algorithm 4.2.1 (Cholesky: Gaxpy Version) Given a symmetric 
positive definite A € R"*", the following algorithm computes a lower tri- 
angular G € R?*" such that A = GGT. For all i > j, G(i, j) overwrites 
A(i, j). 
for j = l:n 
ifj>1 
Alin, j) = A(n, j) ~ AG:m Ej - 1)AG Lj = 17 
end 
AGi:n, j) = AG: j)/ V AG.) 


end 


This algorithm requires n?/3 flops. 


4.2.5 Outer Product Cholesky 


An alternative Cholesky procedure based on outer product (rank-1) updates 
can be derived from the partitioning 


azale v ].[ 6 0 1 0 B 27/6 
tv B| L|v/8 Ia 0 Bw /a 0 hil 
(4:2.6) 
Here, B = ya and we know that a > 0 because A is positive definite. Note 
that B — vu? /a is positive definite because it is a principal submatrix of 
XT AX where -— 
-v [a 
x-fi ^]. 


If we have the Cholesky factorization G1GT = B — vv? /a, then from (4.2.6) 
it follows that A = GGT with 


c= | Ho é |: 


Thus, the Cho.esky factorization can be obtained through the repeated 
application of (4.2.6), much in the the style of &ji Gaussian elimination. 
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Algorithm 4.2.2 (Cholesky: Outer product Version) Given a sym- 
metric positive definite A c R"*", the following algorithm computes a lower 
triangular G € EC" such that A = GGT. For all i > j, G(i, j) overwrites 
A(i, j). 


for k = lm 
A(k, k) = A(k,k) 
A(k + l:n, k) = A(k + lin, k)/A(k, k) 
for j=k+1:n 
A(:n, j) = A(j:n, j) — A(:n, k) AGG, k) 
end 
end 


This algorithm involves n?/3 flops. Note that the j-loop computes the lower 
triangular part of the outer product update 


A(k + n,k + lin) = A(k + link + lin) — Alk + lin, kK) A(k + Lin, k)T. 


Recalling our discussion in §1.4.8 about gaxpy versus outer product up- 
dates, it is easy to show that Algorithm 4.2.1 involves fewer vector touches 
than Algorithm 4.2.2 by a factor of two. 


4.2.6 Block Dot Product Cholesky 


Suppose A € R"*" is symmetric positive definite. Regard A = (Aj;) and its 
Cholesky factor G = (G;j) as N-by-N block matrices with square diagonal 
blocks. By equating (i, j) blocks in the equation A = GGT with i > j it 
follows that 


i 
= $ GaGhe 
kml 


Defining a 
j- 


S= Ay -L Guha 


we see that GGI; = Siti = j and that GG, = Sif i j. Properly 
sequenced, these equations can be arranged to compute all the Gi: 


Algorithm 4.2.3 (Cholesky: Block Dot Product Version) Given a 
symmetric positive definite A € IR^*", the following algorithm computes a 
lower triangular G € R™" such that A = GGT. The lower triangular part 
of A is overwritten by the lower triangular part of G. A is regarded as an 
N-by-N block matrix with square diagonal blocks. 
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for j =1:N 
for i=j7:N 
ja 
S = Ag - Leuch 
ifi= j 
Compute Cholesky factorization S = G,;GT,. 
else 
Solve G4GT, = S for Gy; 
end 


Overwrite Aj; with Gij. 
end 
end 


The overall process involves n?/3 flops like the other Cholesky procedures 
that we have developed. The procedure is rich in matrix multiplication 
assuming a suitable blocking of the matrix A. For example, if n = rN and 
each A;; is r-by-r, then the level-3 fraction is approximately 1 — (1/N’ 2), 

Algorithm 4.2.3 is incomplete in the sense that we have not specified how 
the products GiG; are formed or how the r-by-r Cholesky factorizations 
S = GGT, are computed. These important details would have to be 
worked out carefully in order to extract high performance. 

Another block procedure can be derived from the gaxpy Cholesky algo- 
rithm. After r steps of Algorithm 4.2.1 we know the matrices Gi; € IR^ 
and Ga € R"-* in 


[e A] [8 dS HS UT 

An Azn Gà In-r Ga d] ^ 

We then perform r more steps of gaxpy Cholesky not on A but on the 
reduced matrix A = Az — G3GÍ, which we ezplicitly form exploiting 
symmetry. Continuing in this way we obtain a block Cholesky algorithm 
whose kth step involves r gaxpy Cholesky steps on a matrix of order n — 
(k — 1)r followed a level-3 computation having order n — kr. The level-3 
fraction is approximately equal to 1 — 3/(2N) i£ n rN. 


4.2.7 Stability of the Cholesky Process 


In exact arithmetic, we know that a symmetric positive definite matrix 
has à Cholesky factorization. Conversely, if the Cholesky process runs to 
completion with strictly positive square roots, then A is positive definite. 
Thus, to find out if a matrix A is positive definite, we merely try to compute 
its Cholesky factorization using any of the methods given above. 

The situation in the context of roundoff error is more interesting. The 
numerical stability of the Cholesky algorithm roughly follows from the in- 
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equality . 
sS Yi = Gij. 
kal 


This shows that the entries in the Cholesky triangle are nicely bounded. 
The same conclusion can be reached from the equation || G |3 = || A [la. 

The roundoff errors associated with the Cholesky factorization have 
been extensively studied in a classical paper by Wilkinson (1968). Using 
the results in this paper, it can be shown that if ĉ is the computed solution 
to Ar = b, obtained via any of our Cholesky procedures then 2 solves 
the perturbed system (A + E)$ = b where | E € caul| Alla and ca 
is a small constant depending upon n. Moreover, Wilkinson shows that if 
qa, uk2( A) € 1 where gnis another small constant, then the Cholesky process 
runs to completion, i.e, no square roots of negative numbers arise. 


Example 4.2.2 |f Algorithm 4.2.2 is applied to the positive definite matrix 
100 15 Ol 
A= 15 23 OL 
0 .01 100 
and § = 10, t = 2, rounded arithmetic used, then gi: = 10, $21 = 1.5, a1 = .001 and 
$22 = 0.00. The algorithm then breaks down trying to compute g32. 


4.2.8 The Semidefinite Case 


A matrix is said to be positive semidefinite if zT Az > 0 for all vectors 
z. Symmetric positive semidefinite (sps) matrices are important and we 
briefly discuss some Cholesky-like manipulations that can be used to solve 
various sps problems. Results about the diagonal entries in an sps matrix 
are needed first. 


Theorem 4.2.6 If Ac R"*" is symmetric positive semidefinite, then 


le] S (aii *25,)/2 (4.2.7) 

la] S yaway (#3) (4.2.8) 

max |a] = max ay (4.2.9) 
td i 

a,=0 > A(i,:)=0, A(; i) =0 (4.2.10) 


Proof. If z = ej + e; then 0 < zT Ar = aj; + a5; + 2a,; while z = ei — €; 
implies 0 € zT Ax = ay + a; — 2a;;. Inequality (4.2.7) follows from these 
two results. Equation (4.2.9) is an easy consequence of (4.2.7). 

To prove (4.2.8) assume without loss of generality that i = 1 and j = 2 
and consider the inequality 


T 
0x [z] an a walk | = a2? + apr + an 
an 
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which holds since A(1:2, 1:2) is also semidefinite. This is a quadratic equa- 
tion in z and for the inequality to hold, the discriminant 4o; — 4211822 
must be negative. Implication (4.2.10) follows from (4.2.8). O 


Consider what happens when outer product Cholesky is applied to an sps 
matrix. If a zero A(k, k) is encountered then from (4.2.10) A(k:n, k) is zero 
and there is “nothing to do” and we obtain 


for k = i:n 
if A(k, k) > 0 
A(k,k) = y A(k,k) 
A(k + Ln, k) = A(k + Ln, k)/A(k, k) 
for j= k+hn 
A(j:n, j) = A(n, j) — A(j:n, K)AG, k) (4.2.11) 
end 
end 
end 


Thus, a simple change makes Algorithm 4.2.2 applicable to the semidefinite 
case. However, in practice rounding errors preclude the generation of exact 
zeros and it may be preferable to incorporate pivoting. 


4.2.9 Symmetric Pivoting 


To preserve symmetry in a symmetric A we only consider data reorderings 
of the form PAP? where P is a permutation. Row permutations (A — PA) 
or column permutations (A +— AP) alone destroy symmetry. An update of 
the form 

A — PAPT 


is called à symmetric permutation of A. Note that such an operation does 
not move off-diagonal elements to the diagonal. The diagonal of PAPT is 
a reordering of the diagonal of A. 

Suppose at the beginning of the Ath step in (4.2.11) we symmetrically 
permute the largest diagonal entry of A(k:n,k:n) into the lead position. 
If that largest diagonal entry is zero then A(k:n, k:n) = 0 by virtue of 
(4.2.10). In this way we can compute the factorization PAPT = GGT 
where G c R"*(*-)) ig lower triangular. 


Algorithm 4.2.4 Suppose A € R"*" is symmetric positive semidefinite 
and that rank(A) = r. The following algorithm computes a permutation P, 
the index r, and an n-by-r lower triangular matrix G such that PAPT = 
GGT. The lower triangular part of A(:, 1:r) is overwritten by the lower 
triangular part of G. P = P,--- P, where Py is the identity with rows k 
and piv(k) interchanged. 
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r=0 
for k = lin 
Find q (k € q € n) so A(q,q) = max {A(k, k), .., A(n, n)} 
if A(q,4) >0 
r=rd4tl 
piv(k) =q 
A(k,:) = A(q,:) 
A(s k) = A(,9) 
A(k, k) = / A(k, k) 
A(k + lin, k) = A(k + lin, k)/A(k, k) 
for j =k+1:in 
A(j:n, j) = Alin, j) — Alin, k) A(G, k) 
end 
end 
end 


In practice, a tolerance is used to detect small A(k, k). However, the sit- 
uation is quite tricky and the reader should consult Higham (1989). In 
addition, §5.5 has a discussion of tolerances in the rank detection problem. 
Finally, we remark that a truly efficient implementation of Algorithm 4.2.4 
would only access the lower triangular portion of A. 


4.2.10 The Polar Decomposition and Square Root 
Let A = ULE VT be the thin SVD of A € R™*" where m > n. Note that 
A=(U,V7)\(VE\V") = zP (4.2.12) 


where Z = U,VT and P = VX VT. Z has orthonormal columns and P is 
symmetric positive semidefinite because 


n 
aT Pz = (VTz) (VT) = Yong 20 
k=l 


where y = VTz. The decomposition (4.2.12) is called the polar decom- 
position because it is analogous to the complex number factorization z = 
e*2r9(7)|z|, See $12.4.1 for further discussion. 

Another important decomposition is the matrix square root. Suppose 
A € RP??? is symmetric positive semidefinite and that A = GGT is its 
Cholesky factorization, If G = UEVT is G's SVD and X = UXUT, then 
X is symmetric positive semidefinite and 

A = GG" = (UZVTY(UEVT)? = UY?UT = (UEUTy(UEUT) = X’. 


Thus, X is a square root of A. It can be shown (most easily with eigen- 
value theory) that a symmetric positive semidefinite matrix has a unique 
symmetric positive semidefinite square root. 
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Problems 


P4.2.1 Suppose that H = A +iB is Hermitian and positive definite with A, B € RO*". 
This means that z” Hz > 0 whenever x # 0. (a) Show that 


e- [8 2] 


is symmetric and positive definite, (b) Formulate an algorithm for solving (A--$B)(z--sy) 
= (b + ic), where b, c, z, and y ere in R”. It should involve 8n?/3 flops. How much 
storage is required? 

P4.2.2 Suppose A € ROX" is symmetric and positive definite. Give an algorithm for 
computing an upper trianguler matrix A € R”*®™ such that A = RRT. 

P4.2.8 Let A € ROX" he positive definite and set T = (A--AT)/2 and S = (A—AT™)/2. 
(a) Show that || A7! |a € || T7! ||a and zT A7!x € zTT-!z for all z € R^. (b) Show 
that if A = LDMT, then dy > 1/| T7 ||a for k = Ln 

P4.2.4 Find a 2-by-2 real matrix A with tbe property that zT Az > 0 for all real nonzero 
2-vectors but which is not positive definite when regarded as a member of (2*7, 
P4.2.5 Suppose A € R”*™ has a positive diagonal. Show that if both A and AT are 
strictly diagonally dominant, then A is positive definite. 

P4.2.6 Show that the function f(z) = (zT Az)/2 is a vector norm on R” if and only if 
Ais positive definite. 

P4.2.7 Modify Algorithm 4.2.1 so that if the square root of a negative number is 
encountered, then the algorithm finds a unit vector z 8o zT Az < 0 and terminates. 
P4.2.8 The numerical range W(A) of a complex matrix A is defined to be the set 
W(A) = (xP Az; zz = 1}, Show that if 0  W(A), then A has an LU factorization. 


P4.2.8 Formulate an m < n version of the polar decomposition for A € R™*", 


P4.2.10 Suppose A = I -- uu? where A € R^*^ and || u [5 = 1. Give explicit formulae 
for the diagonal and subdiagonal of A's Choleaky factor. 

P4.2.11 Suppose A € R**" is symmetric positive definite and that its Cholesky factor 
is available. Let e = In{:,k). Fori Si « j € n, let a; be the smallest real that makes 
A-xra(eeT --e;eT) singular. Likewise, let a;; be the smallest real that makes (A--ae;eT) 
singular. Show how to compute these quantities using the Sherman-Morrison- Woodbury 
formula, How many flops are required to find all the a,j? 


Notes and References for Sec. 4.2 


The definitenega of the quadratic form zT Az can frequently be established by considering 
the mathematics of the underlying problem. For example, the discretization of certain 
partial differential operators gives rise to provably positive definite matrices. Aspects of 
tbe unsymmetric positive definite problem are discussed iu 


A. Buckley (1974). “A Note on Matrices A = I + H, H Skew-Symmetric,” Z. Angew. 
Math. Mech, 54, 125-26. 


4.2. POSITIVE DEFINITE SYSTEMS 151 


A. Buckley (1977). “On the Solution of Certain Skew-Symmetric Linear Systems,” SIAM 
J. Num. Anal. 14, 566-70. 

G.H. Golub and C. Van Loan (1979). “Unsymmetric Positive Definite Linear Systema," 
Lin. Alg. and Its Applic. 28, 85-98. 

R. Mathias (1992). "Matrices with Positive Definite Hermitian Part: Inequalities and 
Linear Systems,” SIAM J. Matrix Anal. Appl. 13, 640-654. 


Symmetric positive definite systems constitute the most important class of special Az = b 
problems. Algol programs for these problems are given in 


R.S. Martin, G. Peters, and J,H. Wilkinson (1965). “Symmetric Decomposition of a 
Positive Definite Matrix,” Numer. Math. 7, 362-83, 

RS. Martin, G. Peters, and J.H. Wilkinson (1966). “Iterative Refinement of the Solution 
of a Positive Definite System of Equations," Numer. Math. 8, 203-16. 

F.L. Bauer and C, Reinsch (1971), "Inversion of Positive Definite Matrices by tbe Gauss- 
Jordan Method," in Handbook for Automatic Computation Vol. 2, Linear Algebra, 
J.H. Wilkinson and C. Reinsch, eds. Springer-Verlag, New York, 45-49. 


The roundoff errors associated with the method are analyzed in 


J.H. Wilkinson (1968). “A Priori Error Analysis of Algebraic Processes,” Proc. Inter- 
national Congress Math. (Moecow: Izdat. Mir, 1968), pp. 629-39, 

J. Meinguet (1983). “Refined Error Analyses of Cholesky Factorization,” SIAM J. Nu- 
mer. Anal. 20, 1243-1250. 

A. Kielbasinski (1987). “A Note on Rounding Error Analysis of Cholesky Factorization,” 
Lin. Alg. and [ts Applic. 88/89, 481—494. 

N.J, Higham (1990). "Analysis of the Cholesky Decomposition of a Semidefinite Matrix," 
in Reliable Numerical Computation, M.G, Cox and S.J, Hammarling (eds), Oxford 
University Press, Oxford, UK, 161-185. 

R. Carter (1991). *Y-MP Floating Point and Cholesky Factorization,” Int'l J. High 
Speed Computing 3, 215-222. 

J-Guang Sun (1992). “Rounding Error and Perturhation Bounds for the Cholesky and 
LDL? Factorizations,” Lin. Alg. and its Applic. 173, 7-97. 


The question of how the Cholesky triangle G changes when A = GGT is perturbed is 
analyzed In 


G.W. Stewart (1977b). “Perturbation Bounds for the QR Factorization of a Matrix,” 
SIAM J. Num. Anal. 14, 509-18. 

Z. Dramăc, M. Omladič, and K. Veselič (1994). “On tbe Perturbation of the Cholesky 
Factorization,” SIAM J. Matrix Anal. Appl 15,1319-1332. 


Nearness /sensitivity issues associated with positive semi-definiteness and the polar de- 
composition are presented in 


N.J. Higham (1988). “Computing a Nearest Symmetric Positive Semidefinite Matrix,” 
Lin. Alg. and [ta Applic. 103, 103-118. 

R. Mathias (1993). “Perturbation Bounds for the Polar Decomposition,” SIAM J, Matriz 
Anal. Appi. 14, 588-597. 

R-C. Li (1995). “New Perturbation Bounds for the Unitary Polar Factor,” SIAM J. 
Matriz Anal. Appl 16, 327-332. 


Computationally-oriented references for the polar decomposition and the square root are 
given in $8.6 and §11.2 respectively. 
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4.3 Banded Systems 


In many applications that involve linear systems, the matrix of coefficients 
is banded. This is the case whenever the equations can be ordered so that 
each unknown z; appears in only a few equations in a “neighborhood” of 
the ith equation. Formally, we say that A = (a;;) has upper bandwidth q 
if aj; = 0 whenever j > i + and lower bandwidth p if aj; = 0 whenever 
i > j+p. Substantial economies can be realized when solving banded 
systems because the triangular factors in LU, GGT, LDMT, etc., are also 
banded. 

Before proceeding the reader is advised to review $1.2 where several 
aspects of band matrix manipulation are discussed. 


4.3.1 Band LU Factorization 


Our first result shows that if A is banded and A = LU then L(U) inherits 
the lower (upper) bandwidth of A. 


Theorem 4.3.1 Suppose A € IR"*" has an LU factorization A = LU. If A 
has upper bandwidth q and lower bandwidth p, then U has upper bandwidth 
q and L has lower bandwidth p. 


Proof. The proof is by induction on n. From (3.2.6) we have the factor- 
ization 


A28 wT _ 1 0 1 0 a wt 
"iv B] L|ve Init || 0 B-v7T/a]10 haj’ 
It is clear that B — vu fa has upper bandwidth q and lower bandwidth p 
because only the first q components of w and the first p components of v 
are nonzero. Let LU; be the LU factorization of this matrix. Using the 
induction hypothesis and the sparsity of w and v, it follows that 
T 
_ _/@ wv 
t= | pja a] and U E ^ 
have the desired bandwidth properties and satisfy A = LU. O 


The specialization of Gaussian elimination to banded matrices having an 
LU factorization is straightforward. 


Algorithm 4.3.1 (Band Gaussian Elimination: Outer Product Ver- 
sion) Given A € K?*" with upper bandwidth q and lower bandwidth p, 
the following algorithm computes the factorization A — LU, assuming, it 
exists. A(i, j) is overwritten by L(i, j) if i > j and by U(i, j) otherwise. 
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fork-in-1 
for i = k + L:min(k + p, n) 
AG, k) = AG, K)/ A(k, k) 
end 
for j = k + L:min(k + q,n) 
for i = k + t:min(k + p,n) 
Ali, j) = AG, j) — AG, k) Alk, j) 
end 
end 
end 


If n > p and n 7» q then this algorithm involves about 2npq flops. Band 
versions of Algorithm 4.1.1 (LDMT) and all tbe Cholesky procedures also 
exist, but we leave their formulation to the exercises. 


4.3.3 Band Triangular System Solving 


Analogous savings can also be made when solving banded triangular sys- 
tems. 


Algorithm 4.3.2 (Band Forward Substitution: Column Version) 
Let L c EC*" be a unit lower triangular matrix having lower bandwidth 
P. Given b € R”, the following algorithm overwrites b with the solution to 
Lz-b. 


for j = 1:n 
for i = j + l:min(j + p,n) 
BG) = &() — Lis 3)9) 
end 
end 


If n >> p then this algorithm requires about 2np flops. 


Algorithm 4.3.3 (Band Back-Substitution: Column Version) Let 
U € E?*" be a nonsingular upper triangular matrix having upper band- 
width g. Given b € R”, the following algorithm overwrites b with the solu- 
tion to Uz = b. 
for j=n:— 1:1 
b(j) = 6(9)/UG.9) 
for i = max(1,j —q):j— 1 
b(3) = b(i) ~ U (i, 3)(3) 
end 
end 


If n > q then this algorithm requires about 2ng flops. 
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4.3.3 Band Gaussian Elimination with Pivoting 


Gaussian elimination with partial pivoting can also be specialized to exploit 
band structure in A. If, however, PA = LU, then the band properties of L 
and U are not quite so simple. For example, if A is tridiagonal and the first 
two rows are interchanged at the very first step of the algorithm, then uia 
is nonzero. Consequently, row interchanges expand bandwidth. Precisely 
how the band enlarges is the subject of the following theorem. 


Theorem 4.3.2 Suppose A € R°™" is nonsingular and has upper and lower 
bandwidths q and p, respectively. If Gaussian elimination with partial piv- 
oting is used to compute Gauss transformations 


M; = I1— alf, T j2ln-1 


and permutations P,,..., Pa-1 such that Ma Psi MPRA =U is up- 
per triangular, then U has upper bandwidth p +q and a) = 0 whenever 
t<Sjort>jtp. 


Proof. Let PA = LU be the factorization computed by Gaussian elimi- 
nation with partial pivoting and recall that P = P,-1--- Pi. Write PT = 
[Ess +s Esn }, where (31,..., 55) isa permutation of {1, 2,...,n}. Ifs; > i+p 
then it follows that the leading i-by-i principal submatrix of PA is singular, 
since (PA); = a,,; for j = I:s; —p— 1 and s;-p—1> i. This implies 
that U and A are singular, a contradiction. Thus, s, € i -- p for i = 1:n and 
therefore, PA has upper bandwidth p +q. It follows from Theorem 4.3.1 
that U has upper bandwidth p +q. 

The assertion about the aU) can be verified by observing that, M; need 
only zero elements (j + 1,3)... (3 + p, j) of the partially reduced matrix 
PjMj-iPj-1 ++ PALO 


Thus, pivoting destroys band structure in the sense that U becomes 
“wider” than A’s upper triangle, while nothing at all can be said about 
the bandwidth of L. However, since the jth column of L is a permutation 
of the jth Gauss vector aj, it follows that L has at most p +1 nonzero 
elements per column. 


4.3.4 Hessenberg LU 


As an example of an unsymmetric band matrix computation, we show how 
Gaussian elimination with partial pivoting can be applied to factor an upper 
Hessenberg matrix H. (Recall that if H is upper Hessenberg then Ai; = 0, 
t>j+1). After k — 1 steps of Gaussian elimination with partial pivoting 
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we are left with an upper Hessenberg matrix of the form: 


k=3,n=5 


oooox 
ooo XK XK 
Ox xX xX 
XXXXX 
XXXXX 


By virtue of the special structure of this matrix, we see that the next 
permutation, P5, is either the identity or the identity with rows 3 and 4 
interchanged. Moreover, the next Gauss transformation M, has a single 
nonzero multiplier in the (k + 1, k) position. This illustrates the kth step 
of the following algorithm. 


Algorithm 4.3.4 (Hessenberg LU) Given an upper Hessenberg matrix 
H e K"*", the following algorithm computes the upper triangular matrix 
Maa Pac Mi PH = U where each Ph is a permutation and each My 
is a Gauss transformation whose entries are bounded by unity. H(i, k) is 
overwritten with U(i, k) ifi < k and by (Mi)x+1,2 if i = k +1. An integer 
vector piv(1:n — 1) encodes the permutations. If P, = J, then piv(k) = 0. 
If Pk interchanges rows k and k + 1, then piv(k) = 1. 


for k = lmn-1 
if | H(k, k)| < |H(k +1,&)} 
piu(k) = 1; H(k, kin) + H(k + 1, k:n) 
else 
piv(k) = 0 
end 
if H(k,k) #0 
t= —H(k +1,k)/H(k,k) 
for j=k + 1n 
H(k +1,5) = H(k +1,4) +tH(k, 7) 
- end 
H(k+1,k)=t 
end 
end 


This algorithm requires n? flops. 


4.3.5 Band Cholesky 


The rest of this section is devoted to banded Ar = b problems where the 
matrix A is also symmetric positive definite. The fact that pivoting is 
unnecessary for such matrices leads to some very compact, elegant algo- 
rithms. In particular, it follows from Theorem 4.3.1 that if A = GGT is the 
Cholesky factorization of A, then G has the same lower bandwidth as A. 
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This leads to the following banded version of Algorithm 4.2.1, gaxpy-based 
Cholesky 


Algorithm 4.3.5 (Band Cholesky: Gaxpy Version) Given a symmet- 
ric positive definite A € IR"*" with bandwidth p, the following algorithm 
computes a lower triangular matrix G with lower bandwidth p such that 
A 2 GGT, For all i > j, G(i, j) overwrites A(i, j). 


for j = l:n 
for k = max(1,j — p):j— 1 
A= min(k + p,n) 
A(J:A, j) = A(:.3) — AG, k) AGA, k) 


end 
à =min(j + p,n 
a AQ j) = AG: iy VAG 
en 


If n > p then this algorithm requires about n(p? + 3p) flops and n square 
roots. Of course, in a serious implementation an appropriate data structure 
for A should be used. For example, if we just store the nonzero lower 
triangular part, then a (p + 1)-by-n array would suffice. (See §1.2.6) 

If our band Cholesky procedure is coupled with appropriate band trian- 
gular solve routines then approximately np? + 7np + 2n flops and n square 
roots are required to solve Ax = b. For small p it follows that the square 
roots represent a significant portion of the computation and it is prefer- 
able to use the LDLT approach. Indeed, a careful flop count of the steps 
A = LDI", Ly = b, Dz = y, and LTz = z reveals that np? + 8np +n flops 
and no square roots are needed. 


4.3.6 — Tridiagonal System Solving 


As a sample narrow band LDLT solution procedure, we look at the case of 
symmetric positive definite tridiagonal systems. Setting 


1 e 0 


e 1 


e 
[i 
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and D = diag(di,...,d,) we deduce from the equation A = LDLT that: 


on = d 
Gkk-i = k-idk-1 k=2n 
Okk = dk +e adea = dk rex anke a k=2n 


Thus, the d; and e; can be resolved as follows: 


dj = 01 
for k = 2:n 

Ek—1 = Gk,k—1/dk-1; dk = Gkk — ek 10k k-1 
end 


To obtain the solution to Az = b we solve Ly = b, Dz = y, and LTz = z. 
With overwriting we obtain 


Algorithm 4.3.6 (Symmetric, Tridiagonal, Positive Definite Sys- 
tem Solver) Given an n-by-n symmetric, tridiagonal, positive definite 
matrix A and b € R”, the following algorithm overwrites b with the solu- 
tion to Az = b. It is assumed that the diagonal of A is stored in d(1:n) and 
the superdiagonal in e(1:n — 1). 


for k = 2:n 
= e(k — 1); e(k— 1) = t/d(k — 1); d(k) = d(k) — te(k — 1) 
end 
for k = 2:n 
b(k) = b(k) — e(k — D)b(k — 1) 
end 


b(n) = b(n) /d(n) 
for k=n—1:-—1:1 

&(k) = (k) /d(k) — e(k)b(k + 1) 
end 


This algorithm requires 8n flops. 


4.3.7 Vectorization Issues 


The tridiagonal example brings up a sore point: narrow band problems and 
vector/pipeline architectures do not mix well. The narrow band implies 
short vectors. However, it is sometimes the case that large, independent 
sets of such problems must be solved at the same time. Let us look at how 
such a computation should be arranged in light of the issues raised in §1.4. 

For simplicity, assume that we must solve the n-by-n unit lower bidiag- 
onal systems 

AGO LH kas im 
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and that m `> n. Suppose we have arrays E(1:n — 1, 1:m) and B(t:n, t:m) 
with the property that E(1:n — 1, k) houses the subdiagonal of AU? and 
B(1:n, k) houses the kth right hand side 59 , We can overwrite b“) with 
the solution z'* as follows: 


for k = lm 
for i = 2:n 
Bi, k) = B(i,k) - E(i — 1,k)B(1— 1,4) 
end 
end 


The problem with this algorithm, which sequentially solves each bidiagonal 
system in turn, is that the inner loop does not vectorize. This is because 
of the dependence of B(i,k) on B(i — 1,&). If we interchange the k and i 
loops we get 


for i = 2:n 
for k = lim 
Bi, k) = B(i, k) — E(1— 1, k)B(i - 1, k) (4.3.1) 
end 
end 


Now the inner loop vectorizes well as it involves a vector multiply and a 
vector add. Unfortunately, (4.3.1) is not a unit stride procedure. However, 
this problem is easily rectified if we store the subdiagonals and right-hand- 
sides by row. That is, we use the arrays E(1:m, 1l:n — 1) and B(1:m, L:n - 1) 
and store the subdiagonal of A(*) in E(k, 1:n — 1) and B®” in B(k, L:n). 
The computation (4.3.1) then transforms to 


for i = 2:n 
for k = lim 
B(k,i) = B(k, i) — E(k, i - 1)B(k, i — 1) 
end 
end 


illustrating once again the effect of data structure on performance. 


4.3.8 Band Matrix Data Structures 


The above algorithms are written as if the matrix A is conventionally stored 
in an n-by-n array. In practice, a band linear equation solver would be or- 
ganized around a data structure that takes advantage of the many zeroes 
in A. Recall from §1.2.6 that if A has lower bandwidth p and upper band- 
width q it can be represented in a (p + q + 1)-by-n array A.band where 
band entry aj; is stored in A.band(i—j+q+1,j). In this arrangement, the 
nonzero portion of A’s jth column is housed in the jth column of A.band. 
Another possible band matrix data structure that we discussed in §1.2.8 
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involves storing A by diagonal in a 1-dimensional array A.diag. Regardless 
of the data structure adopted, the design of a matrix computation with a 
band storage arrangement requires care in order to minimize subecripting 
overheads. 


Problems 


P4.3.1 Derive a banded LDM” procedure similar te Algorithm 4.3.1. 
P4.3.2 Show how the output of Algorithm 4.3.4 can be used to solve the upper Hes- 
senberg system Hx = b. 
P4.3.3 Give an algorithm for solving an unsymmetric tridiagonal system Az = b that 
uses Gaussian elimination with partial pivoting. It should require only four n-vectors of 
floating point storage for the factorization. 
P4.3.4 For C € R**" define the profile indices m(C,i) = min{j:q; 4 0}, where 
i= Ln. Show that if A = GGT is tbe Cholesky factorization of A, then m(A,i) = 
m(G, i) for i = 1:n. (We say that G has the same profile as A.) 
P4.3.5 Suppose A € K?*" is symmetric positive definite with profile indices my = 
r(A, $) whare i = l:n. Assume that A is stored in a one-dimensional array v as follows: 
v = (011,02,m4, --.:022, 63, mg. t 033,- -+p nma»: Gnn). Write an algorithm that 
overwrites v with the corresponding entries of the Cholesky factor G and then uses this 
factorization to solve Az = b. How many flopa are required? 
P4.3.6 For C € RO%" define p(C, i) = max(j:c;j # 0). Suppose that A € KE? *^ has an 
LU factorization A = LU and that: 

m(A,1) € m(A2) < - < m(A,n) 

PA! S p(A2 S + S PAR) 
Show that m(A, i) = m(Z,i) and p(A, i) = p(U,i) for i = l:n. Recall tbe definition of 
m(A, i) from P4.3.4. 
P 4.3.7 Develop a gaxpy version of Algorithm 4.3.1. 
P 4.3.8 Develop a unit stride, vectorizable algorithm for solving the symmetric positive 
definite tridiagonal systems A®)z(®) = BOO. Assume that the diagonals, superdiagonals, 
and right band sides are stored by row in arrays D, E, and B and that 5(*) is overwritten 
with 
P4.3.9 Develop a version of Algorithm 4.3.1 in which A is stored by diagonal. 
P4.3.10 Give an example of a 3-by-3 symmetric positive definite matrix whose tridiag- 
onal pert is not positive definite, 
P4.3.11 Consider the Ax = b problem where 


2 -l Ove 0 -l 
-l 2 -1 0 
Az 0 -l 2 
0 
Q oce s 2 | 
-l 0 e 0 -l 2 


This kind of matrix arises in boundary value problema with periodic boundary conditions. 
(a) Show A is singular. (b) Give conditions that 5 must satisfy for there to exist a solution 
and specify an algorithm for solving it. (c). Assume that n is even and consider tbe 
permutation 

P = | €y €n esex-ies e] 
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where ey is the kth column of In., Describe the transformed system PT AP(PTx) = PTb 
and show how to soive it, Assume that there is a solution and ignore pivoting. 


Notes and References for Sec. 4.3 


The literature concerned with banded systems is immense, Some representative papers 
include 


R.S. Martin and J.H. Wilkinson (1985). “Symmetric Decomposition of Positive Definite 
Band Matrices,” Numer. Math. 7, 355-61. 

R. S. Martin and J.H. Wilkinson (1967), “Solution of Symmetric and Unsymmetric Band 
Equations and the Calculation of Eigenvalues of Band Matrices,” Numer, Math. 9, 
279-301. 
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279-84. 
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N.J. Higham (1990). “Bounding tbe Error in Gaussian Elimination for Tridiagonal 
Systems," SIAM J. Matriz Anal. Appl. 11, 521-530, 


A topic of considerable interest in tbe area of banded matrices deals with methods for 
reducing the width of the band, See 


E, Cuthill (1972). “Several Strategies for Reducing the Bandwidth of Matrices,” in 
Sparse Matrices ond Their Applications, ed. D.J. Rose and R.A, Willoughby, Plenum 
Press, New York. 
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N.E. Gibba, W.G. Poole, Jr., and P.K, Stockmeyer (1976). “A Comparison of Several 
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As we mentioned, tridiagonal systeme arise with perticular frequency, Thus, it is not 
surprising that a great deal of attention bas been focused on special methods for this 
class of banded problemas, 


C. Fischer and R.A. Usmani (1969). “Properties of Some Tridiagonal Matrices and Their 
Application to Boundary Value Problems,” SIAM J. Num. Anal. 6, 127-42. 

D.J. Rose (1969). “An Algorithm for Solving a Special Class of Tridiagonal Systems of 
Linear Equations,” Comm. ACM 12, 234-36. 

HLS, Stone (1973). “An Efficient Parallel Algorithm for the Solution of a Tridiagonal 
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M.A. Malcolm and J. Paimer (1974). “A Fast Method for Solving a Class of Tridiagonal 
Systems of Linear Equations,” Comm. ACM 17, 14-17, 

J. Lambiotte and R.G. Voigt (1975). “The Solution of Tridiagonal Linear Systems of 
the CDC-STAR 100 Computer,” ACM Trans. Math. Soft. 1, 308-29. 

HLS, Stone (1975). “Parallel Tridiagonal Equation Solvers,” ACM Trans, Math. Soft.1, 
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D. Kershaw(1982). “Solution of Single Tridisgonal Linear Systems and Vectorization of 
the ICCG Algorithm on the Cray-1," in G. Roderigue (ed), Parallel Computation, 
Academic Press, NY, 1982. 

N.J. Higham (1986). “Efficient Algorithms for computing the condition number of a 
tridiagonal matrix,” SIAM J. Sci. and Stat. Comp. 7, 150-165. 


Chapter 4 of George and Liu (1981) contains a nice survey of band methods for positive 
definite systems. 
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4.4 Symmetric Indefinite Systems 


A symmetric matrix whose quadratic form x7 Az takes on both positive and 
negative values is called indefinite. Although an indefinite A may have an 
LDLT factorization, the entries in the factors can have arbitrary magnitude: 


e 1] [31 o][fe 0 1 07 

10j7[ye 1][0 -ue]lue 1j c 
Of course, any of the pivot strategies in 53.4 could be invoked. However, 
they destroy symmetry and with it, the chance for a “Cholesky speed" 
indefinite system solver. Symmetric pivoting, i.e., data reshufflings of the 
form A — PAPT, must be used as we discussed in §4.2.9. Unfortunately, 
symmetric pivoting does not always stabilize the LDLT computation. If e 
and e? are small then regardless of P, the matrix 


Asp|9 lipr 
l € 


has small diagonal entries and large numbers surface in the factorization. 
With symmetric pivoting, the pivots are always selected from the diagonal 
and trouble results if these numbers are small relative to what must be 
zeroed off the diagonal. Thus, LDLT with symmetric pivoting cannot be 
recommended as a reliable approach to symmetric indefinite system solving. 
It seems that the challenge is to involve the off-diagonal entries in the 
pivoting process while at the same time maintaining symmetry. 

In this section we discuss two ways to do this. The first method is due 
to Aasen(1971) and it computes the factorization 


PAPT = LTLT (44.1) 


where L = (£i) is unit lower triangular and T is tridiagonal. P is a permu- 

tation chosen such that |Z;;| < 1. In contrast, the diagonal pivoting method 

due to Bunch and Parlett (1971) computes a permutation P such that 
PAP? = LDLT (4.4.2) 


where D is a direct sum of 1-by-1 and 2-by-2 pivot blocks. Again, P is 
chosen so that the entries in the unit lower triangular L satisfy |é,,| < 1. 
Both factorizations involve n°/3 flops and once computed, can be used to 
solve Az = b with O(n?) work: 


PAPT = LTL" ,Lz = Pb,Tw=2,L’y=w,r= Py > Ar=b 


PAPT = LDL" Lz = Pb, Dw =z, L"y =w, = Py > Ar=b 


The only thing “new” to discuss in these solution procedures are the Tw = z 
and Dw = z systems. 
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In Aasen's method, the symmetric indefinite tridiagonal system T'w — z 
js solved in O(n) time using band Gaussian elimination with pivoting. Note 
that there is no serious price to pay for the disregard of symmetry at this 
level since the overall process is O(n?). 

In the diagonal pivoting approach, the Dw = z system amounts to a set 
of 1-by-1 and 2-by-2 symmetric indefinite systems. The 2-by-2 problems 
can be handled via Gaussian elimination with pivoting. Again, there is no 
harm in disregarding symmetry during this O(n) phase of the calculation. 

Thus, the central issue in this section is the efficient computation of the 
factorizations (4.4.1) and (4.4.2). 


4.4.1 The Parlett-Reid Algorithm 


Parlett and Reid (1970) show how to compute (4.4.1) using Gauss trans- 
forms. Their algorithm is sufficientiy illustrated by displaying the k = 2 
step for the case n = 5. At the beginning of this step the matrix A has 
been transformed to 


a A 0 0 0 
Bi œ vs 04 vs 
AU = MBAPIML = | 0 «m x x x 
O v x x x 
Ü vg x x x 


where P, is a permutation chosen so that the entries in the Gauss trans- 
formation M; are bounded by unity in modulus. Scanning the vector 
(v3 v4 vs)? for its largest entry, we now determine a 3-by-3 permutation P; 
such that 


- v3 Ds 
Pp) ve | = | ia = [8s| = max{|8sl, [9a], los} - 
Üs 

If this maximal element is zero, we set Mz = Pz = I and proceed to the 
next step. Otherwise, we set P = diag(I;, P;) and M; = I — aef with 


a® = (0 0 0 /ó à)" 


and observe that 
ay A 0 0 0 
Ay ag $3 0 0 
AO = MoP,AUPTMT = | 0 i x x x 
0 0 x x x 
0 0 x x x 


In general, the process continues for n— 2 steps leaving us with a tridiagonal 
matrix 


T = Al) = (Ma aPa-2-- M Pi) A(Mn-2Pa-2 ° Mi Py)? 
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It can be shown that (4.4.1) holds with P = P, 3... Pj and 
L-(Mma-2PQ-2:- MPPT)! . 


Analysis of L reveals that its first column is e and that its subdiagonal 
entries in column k with k > 1 are “made up" of the multipliers in M4. 1. 

The efficient implementation of the Parlett-Reid method requires care 
when computing the update 


A® = M,(P, AC-? PT) MT. (4.4.3) 


To see what is involved with a minimum of notation, suppose B = BT has 
order n — k and that we wish to form: By = (I — wef )B(I — wet)” where 
w € R?^* and e, is the first column of I, 4. Such a calculation is at the 
heart of (4.4.3). If we set 


u = Be; — Piu, 


then the lower half of the symmetric matrix B, = B — wu? — uwT can 
be formed in 2(n — k)? flops. Summing this quantity as k ranges from 1 
to n — 2 indicates that the Parlett-Reid procedure requires 2n?/3 flops— 
twice what we would like. 


Example 4.4.1 If the Parlett-Reid algorithm is applied to 


0123 
1222 
A= 223 3 
32 34 


Pi = [ee esez] 

M = ia- (0, 0, 2/3, 1/3, "e3 

P = [eez ees] 

Mz = i- (0, 0, 0, 1/2)7eF 
and PAPT = LTLT , where P = [e1, es, e4, e], 


10 00 03 0 o 
0 1 0 0 | ]3 4 23 0 
L2]|o qa 1 of md T=} 6 or 1w o |- 

1 0 0 


0 2/3 1/2 0 1/2 


then 


44.2 The Method of Aasen 


An n? /3 approach to computing (4.4.1) due to Aasen (1971) can be derived 
by reconsidering some of the computations in the Parlett-Reid approach. 
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We need a notation for the tridiagonal T: 


a fy "P 0 

B a : 
T= - 

: mot Bn-1 

0 + Bn-1 Om 


For clarity, we temporarily ignore pivoting and assume that the factoriza- 
tion A = LTL? exists where L is unit lower triangular with L(:,1) = es. 
Aasen's method is organized as follows: 


for j = l:n 
Compute A(1:j) where h = TL e; = Hej. 
Compute a(j). 
ifjzn-1 
Compute (7) (4.4.4) 
end 
ifjzn-2 
Compute L(j + 2:n, j + 1). 
end 
end 


Thus, the mission of the jth Aasen step is to compute the jth column of 
T and the (j + 1)-st column of L. The algorithm exploits the fact that the 
matrix H = TL? is upper Hessenberg. As can be deduced from (4.4.4), 
the computation of a(j), G(j), and L(j + 2:n, j + 1) hinges upon the vector 
h(1:j) = H(1:j, j). Let us see why. 

Consider the jth column of the equation A = LH: 


Al, j) = D, ey + 1)A(15j 1). (4.4.5) 


This says that A(:, j) is a linear combination of the first j + 1 columns of 
L. In particular, 


A(j -1:,j) = L(j En, LJ)A(1:3) + LG + linj + 1)AG +1). 
It follows that if we compute 
v(j + l:n) = A(j + En, j) - L(j + Lin, 1:5)h(1:) , 


then 
L(j*1m,j + DAG 1) = v(j Lm). (4.4.6) 


4.4. SYMMETRIC INDEFINITE SYSTEMS 165 


Thus, L(j + 2:1, j + 1) is a scaling of v(j + 2:n). Since L is unit lower 
triangular we have from (4.4.6) that 


v(j +1) = ACG +1) 


and so from that same equation we obtain the following recipe for the 
(j + 1)-st column of L: 


L(j 42:3, j +1) = v(j + 2:)/v(j +1). 
Note that L(j + 2:n, j + 1) is a scaled gaxpy. 
We next develop formulae for a(j) and (j). Compare the (7, j) and 

(j #1, j) entries in the equation H = TLT. With the convention (0) = 0 
we find that A(j) = PG — 1)L(j, j — 1) + a(j) and h(j + 1) = v(j +1) and 
so 

a(j) h(j) - BGG - 216,3 - 1) 

AG) = v1. 
With these recipes we can completely describe the Aasen procedure: 


for j = l:n 

Compute A(1:j) where h = TLT ej. 

ifj=1Vj=2 
a(z) = h(j) 
a(j) = h(j) - BG -1)LG,5 - 1) 

end 

ifj<n-1 (4.4.7) 
v(j + lin) = A(j + Lin, j) — L(G + 1:n,1:7)h(1:7) 
AG) = oj +1) 

end 


ifjzn-2 
LG + 2:,j 1) = v(j  2n)/vG +1) 


else 


end 
end 


To complete the description we must detail the computation of (1:7). 
From (4.4.5) it follows that 
A(1:5,j) = L1:  1:3)h(1:3) - (4.4.8) 


This lower triangular system can be solved for h(1:j) since we know the first 
j columns of L. However, a much more efficient way to compute H(1:j,7) 
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is obtained by exploiting the jth column of the equation H = TLT. In 
particular, with the convention that 6(0)L(j,0) = 0 we have 


h(k) = B(k - 1)LG,k 2 1) + a(k) LG, k)  B(K)LG, k + 1). 


for k = 1:j. These are working formulae except in the case k = j because 
we have not yet computed a(j) and 6(j). However, once h(1:j — 1) is known 
we can obtain h(j) from the last row of the triangular system (4.4.8), i.e., 


j-1 
AG) = AG.) - $5 LG E)AQD - 


k=l 


Collecting results and using a work array £(1:n) for L(j, 1:7) we see that 
the computation of A(1:7) in (4.4.7) can be organized as follows: 


if j=1 
A(1) = A(1,1) 
elseif j = 2 
A(1) = (1); 4(2) = A(2,2) (4.4.9) 
else 
40) = 0; £(1) = 0; (2: - 1) = L(j, 2:9 - 1); eG) =1 
h(j) = A(G.3) 
for k=1:j-1 


h(k) = B(k — 1)£(k — 1) + a(kK)(K) + B(k)e(K + 1) 
AG) = h(i) — &(K)h(K) 
end 
end 


Note that with this O(j) method for computing A(1:j), the gaxpy calcula- 
tion of v(j + 1:n) is the dominant operation in (4.4.7). During the jth step 
this gaxpy involves about 2j(n — j) flops. Summing this for j = l:n shows 
that Aasen's method requires n°/3 flops. Thus, the Assen and Cholesky 
algorithms entail the same amount of arithmetic. 


4.4.3  Pivoting in Aasen’s Method 


As it now stands, the columns of L are scalings of the v-vectors in (4.4.7). 
If any of these scalings are large, i.e., if any of the v(j + 1)'s are small, 
then we are in trouble. To circumvent this problem we need only permute 
the largest component of v(j + 1:n) to the top position. Of course, this 
permutation must be suitably applied to the unreduced portion of A and 
the previously computed portion of L. 


Algorithm 4.4.1 (Aasen's Method) If A c R?"" is symmetric then 
the following algorithm computes a permutation P, a unit lower triangular 
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L, and a tridiagonal T such that PAP? = LTL" with |L(i,j)| € 1. The 
permutation P is encoded in an integer vector piv. In particular, P = 
P, +++ Pa-2 where Pj is the identity with rows piv(j) and j +1 interchanged. 
The diagonal and subdiagonal of T are stored in a(1:n) and (l:n — 1), 
respectively. Only the subdiagonal portion of L(2:n, 2:n) is computed. 


for j = l:n 
Compute h(1:j) via (4.4.9). 
ifj21vj22 
a(j) = ACJ) 
else 


alj) = AG) - 6G - DLG, j — 1) 

end 

ifjzn-1 
v(j + Lin) = A(j + l:n, j) — LG + Lin, 1:5)A(1:7) 
Find q so |v(g)| = || vj + Ln) ll; with j +1 <q <n. 
piv(j) =g; v(3 + 1) > vla); LG  1,2:5) = Lla, 23) 
A(j * 1,j + n) + Alg j +1) 
A(j + lin, j +1) e A(j + 1n, q) 
B(3) = v +1) 


end 
ifjzn-2 
L(j + 2:n, j +1) = v(j + 2:0) 
if v(j +1) #0 
Lj + zn, j +1) = LG + 2:,3 + D/e( +1) 
end 
end 


end 


Aasen's method is stable in the same sense that Gaussian elimination with 
partial pivoting is stable. That is, the exact factorization of a matrix near 
A is obtained provided || T |[2/|} A il2 = 1, where T is the computed version 
of the tridiagonal matrix T. In general, this is almost always the case. 

In a practical implementation of the Aasen algorithm, the lower trian- 
gular portion of A would be overwritten with L and T. Here is n = 5 
case: 


a 
B œ 

Ae | l2 fr œ 
ta ta h o0. 
fsz fsa lss f. as 


Notice that the columns of L are shifted left in this arrangement. 
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4.4.4 Diagonal Pivoting Methods 
We next describe the computation of the block LDLT factorization (4.4.2). 
We follow the discussion in Bunch and Parlett (1971). Suppose 


s 
n—s 


E CT 
PAPE = E 
s n—s 
where P, is a permutation matrix and s = 1 or 2. If A is nonzero, then it is 


always possible to choose these quantities so that E is nonsingular thereby 
enabling us to write 


T I, E-cT 
BAM = ga p. ]l* B- cor || 9 Ines 


For the sake of stability, the s-by-s “pivot” E should be chosen so that the 
entries in ` 

A = (@) = B- CECT (4.4.10) 
are suitably bounded. To this end, let a € (0,1) be given and define the 
size measures 


io = max les 
ij 


m max lai]. 
i 


The Bunch-Parlett pivot strategy is as follows: 


if ui > apo 
8-1 
Choose P So leil =f. 


3z2 
Choose P, so jez] = po. 
end 


It is easy to verify from (4.4.10) that if s = 1 then 
lag] € (1-- a^*)io (4.4.11) 
while s = 2 implies 
3-a 


las] < SoHo. (4.4.12) 
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By equating (1 +-a~1)?, the growth factor associated with two s = 1 steps, 
and (3~a)/(1—a), the corresponding s = 2 factor, Bunch and Parlett con- 
clude that a = (1+ V17)/8 is optimum from the standpoint of minimizing 
the bound on element growth. 

The reductions outlined above are then repeated on the n — s order 
symmetric matrix A. A simple induction argument establishes that the 
factorization (4.4.2) exists and that n?/3 flops are required if the work 
associated with pivot determination is ignored. 


4.4.5 Stability and Efficiency 


Diagonal pivoting with the above strategy is shown by Bunch (1971) to be 
as stable as Gaussian elimination with complete pivoting. Unfortunately, 
the overall process requires between n?/12 and n*/6 comparisons, since io 
involves a two-dimensional search at each stage of the reduction. The actual 
number of comparisons depends on the total number of 2-by-2 pivots but 
in general the Bunch-Parlett method for computing (4.4.2) is considerably 
slower than the technique of Aasen. See Barwell and George(1976). 

This is not the case with the diagonal pivoting method of Bunch and 
Kaufman (1977). In their scheme, it is only necessary to scan two columns 
at each stage of the reduction. The strategy is fully illustrated by consid- 
ering the very first step in the reduction: 


a = (1+ V17)/8; À = jam | = max(lanl.-- lam] 


ifA>0 
if [a31| > aA 
s=];Aħ =I 
else 
= [aye | = max{ jar,- |er—rel, len enel s lanri} 
if ojan] 2 aX? 
s=1,R =I 
elseif [a,,| 2 ae 
s= 1 and choose P) so (PF AP) = a. 
else 
s = 2 and choose P, so (PF AP\)2: = arp. 
end 
end 
end 


Overall, the Bunch-Kaufman algorithm requires n3/3 flops, O(n?) compar- 
isons, and, like all the methods of this section, n?/2 storage. 
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Example 4.4.2 If the Bunch-Kaufman algorithm is applied to 
1 lo 20 
A-]|10 1 30 
20 30 1 
then in the first step A = 20, r = 3, 0 = 30, and p= 2. The permutation P = [ e3 e2 e) | 
is applied giving 
1 30 2] 


PAPT = | 30 1 10 
20 10 | 


A 2-by-2 pivot is then used to produce the reduction 


1 0 0 1 30 0 1 0 0] 
PAPT = 0 1 0 30 1i 0 0 I 0 
3115 .86563 1 0 0 -11790 3115 6563 1 


4.4.6 A Note on Equilibrium Systems 
À very important class of symmetric indefinite matrices have the form 
A- C B]n 
~ | BT 0| p (4.4.13) 
n p 


where C is symmetric positive definite and B has full column rank. These 
conditions ensure that A is nonsingular. 

Of course, the methods of this section apply to A. However, they do not 
exploit its structure because the pivot strategies “wipe out" the zero (2,2) 
block. On the other hand, here is a tempting approach that does exploit 
A's block structure: 


(a) Compute the Cholesky factarization of C, C = GGT. 

(b) Solve GK = B for K € R. 

(c) Compute the Cholesky factorization of KTK = BTC-1B, HHT = 
KTK. 


From this it follows that 
GF K 
e H 0 -HT|' 
In principle, this triangular factorization can be used to solve the equslib- 


rium system 
& elje- Hl a 
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However, it is clear by considering stepa (b) and (c) above that the accuracy 
of the computed solution depends upon «(C) and this quantity may be 
much greater than x(A). The situation has been carefully analyzed and 
various structure-exploiting algorithms have been proposed. À brief review 
of the literature is given at the end of the section. 

But before we close !t is interesting to consider a special case of (4.4.14) 
that clarifies what it means for an algorithm to be stable and illustrates 
how perturbation analysis can structure the search for better methods. 
In several important applications, g = 0, C is diagonal, and the solution 
subvector y is of primary importance. A manipulation of (4.4.14) shows 
that this vector is specified by 


y = (BTC-1 B! BC? (4.4.15) 


Looking at this we are again led to believe that &(C) should have a bearing 
on the accuracy of the computed y. However, it can be shown that 


I (8TC? B)? BTC7' || < ve (44.16) 


where the upper bound Yp is independent of C, a result that (correctly) 
suggests that y is not sensitive to perturbations in C. A stable method for 
computing this vector should respect this, meaning that the accuracy of 
the computed y should be independent of C. Vavasis (1994) has developed 
a method with this property. It involves the careful assembly of a matrix 
V e R?*"-?) whose columns are a basis for the nullspace of B7 C71. The 
n-by-n linear system 


ig. vilz]- 


is then solved implying f = By + Vg. Thus, BTC" = BT C-1By and 
(4.4.15) holds. 


Problems 


P4.4.1 Show that if all the 1-by-1 and 2-by-2 principal submatrices of an n-by-n 
symmetric matrix A are singular, then A is zero. 

P4.4.2 Show that no 2-by-2 pivots can arise in the Bunch-Kaufman algorithm if A is 
positive definite. 

P4.4.3 Arrange Algorithm 4.4.1 so that only the lower triangular portion of A is 
referenced and so that a(j) overwrites A(j, j) for j = Ln, B(j) overwrites A(j + 1,3) for 
j= Un - 1, and L(é, j) overwrites A(i, j — 1) for j = 2:n — 1 and i = j + In. 

P4.4.4 Suppose A € KUX" is nonsingular, symmetric, and strictly diagonally dominant. 
Give an algorithm that computes tbe factorization 


sat -[$ 4][^o ae 


where R € R*** and M c R(^-)X(^-9) are lower triangular and nonsingular and II is 
a permutation. 
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P4.4.5 Show that if 
Án An n 


An -An | P 

n P 
is symmetric with A1; and A22 positive definite, then it has an LDLT factorization with 
the property that 

pu{[m 9? 
"ilo -D 

where D, € R°*" and D4 € FP*? beve positive diagonal entries, 
P4.4.8 Prove (44.11) and (44.12). 
P4.4.T Shaw that -(BTC-1H)-! is the (2,2) block of A7! where A is given by (4.4.13). 


As 


P4.4.8 The point of this problem is to consider a special case of (4.4.15). Define the 
matrix 

M(a) = (BTC^ B! BT c7! 
where 

C= (in taere?) a»-i 
and ej = 1,,(:,k). (Note that C is just tbe identity with a added to the (k, k) antry.) 
Assume that B € F^X? has rank p and show that 


M(a) = (BT B)-! BT (s - Trage) 


where w = (In - B(BT B)! BT)e,. Show that if || wl = O or f wl; = 1, then 
I M(a)]l; = 1/emin(B). Show that if 0 < |} w ||; < 1, then 


1 1 
Il M(a)lla S wo | tg) f onem 


Thus, || M(a) ||; has an a-independent upper bound. 
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tems, there are several results like (4.4.15) that underpin tbe moat effective algorithms. 
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and the included references, A discussion of (4.4.16) may be found In 


G.W. Stewart (1989). “On Scaled Projections and Pseudomverses, Lin. Alg. ond Its 
Applic. 112, 189-193. 

D.P, O'Leary (1990). “On Bounds for Scaled Projections and Pseudoinverses," Lin. Alg. 
and its Applic. 132, 115-117. 

M.J. Todd (1990). “A Dantzig-Wolfe-like Variant of Karmarkee’s Interior- Point Linear 
Programming Algorithm,” Operations Research 38, 1006-1018. 


4.5 Block Systems 


In many application areas the matrices that arise have exploitable block 
structure. As a case study we have chosen to discuss block tridiagonal 
systems of the form 


D A 00 n b 
E, D, "n : z2 b 
noon > da |]: (4.5.1) 
: Mos Fai : : 
0 -- En-1 Ds Zn bs 


Here we assume that all blocks are q-by-q and that the x; and b; are in 
RY’, In this section we discuss both a block LU approach to this problem as 
well as a divide and conquer scheme known as cyclic reduction. Kronecker 
product systems are briefly mentioned. 


4.5.1 Block Tridiagonal LU Factorization 


We begin by considering a block LU factorization for the matrix in (4.5.1). 
Define the block tridiagonal matrices A, by 
D A vee 0 
E, Dy : 
Ay = DEOS karin. (4.5.2) 
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Comparing blocks in 


I 0 Uu A 0 
Lı I : 0 U : 
A 2 DN (4.5.3) 
: " : non Pay 
Qo Lea d 0o 0 Un 


we formally obtain the following algorithm for the L; and U;: 


U =D 

for i = 2:n 
Solve £;-1U;-1 = Ei-i for Lj-1. (4.5.4) 
U; = Di - Lii 

end 


The procedure is defined so long as the U; are nonsingular. This is assured, 
for example, if tbe matrices A;,...,Apn are nonsingular. 

Having computed the factorization (4.5.3), the vector z in (4.5.1) can 
be obtained via block forward and back substitution: 


yi = bd 
for i = Zn 
Yi = b; — Louie 
end (4.5.5) 


Solve Unzn = Yn for tn. 
for i=n-1:-1:1 

Solve U;z, = wo Fizigi for z;. 
end 


To carry out both (4.5.4) and (4.5.5), each U; must be factored since linear 
systems involving these submatrices are solved. This could be done using 
Gaussian elimination with pivoting. However, this does not guarantee the 
stability of the overall process. To see this just consider the case when the 
block size q is unity. 


4.5.2 Block Diagonal Dominance 


In order to obtain satisfactory bounds on the L; and U; it is necessary 
to make additional assumptions about the underlying block matrix. For 
example, if for 1 = 1:n we have the block diagonal dominance relations 


| Di WO RE ER): By, = Fy =0 (4.5.6) 
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then the factorization (4.5.3) exists and it is possible to show that the L; 
and U; satisfy the inequalities 


I Lil 
I Ul 


1 (4.5.7) 
I As th (4.5.8) 


IA IA 


4.5.8 Block Versus Band Solving 


At this point it is reasonable to ask why we do not simply regard the matrix 
A in (4.5.1) as a qn-by-qn matrix having scalar entries and bandwidth 
2q— 1. Band Gaussian elimination as described in $4.3 could be applied. 
The effectiveness of this course of action depends on such things as the 
dimensions of the blocks and the sparsity patterns within each block. 


To illustrate this in a very simple setting, suppose that we wish to solve 
D EBl[an|Lllh 
2 ajja] [è usa 


where D; and D» are diagonal and F, and E, are tridiagonal. Assume 
that each of these blocks is n-by-n and that it is "safe" to solve (4.5.9) via 
(4.5.3) and (4.5.5). Note that 


th = Dy (diagonal) 
li EU! (tridiagonal) 
U = Dy - LF, (pentadiagonal) 
no = bh 
y) = h- E(D n) 

Ux = h 

Dix = y- Fun. 


Consequently, some very simple n-by-n calculations with the original banded 
blocks renders the solution. 

On the other hand, the naive application of band Gaussian elimination 
to the system (4.5.9) would entail a great deal of unnecessary work and 
storage as the system has bandwidth n + 1. However, we mention that by 
permuting the rows and columns of the system via the permutation 


P = [ei 68e €25. -> 1 En, €n] (4.5.10) 
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we find (in the n = 5 case) that 


PAPT = 


eoooooo x © XK XxX 
eooooooo XK KX 
eoooox OK x xo 
eoooooxKx xx Ox 
ooxox x xooo 
ooox x xoxkoo 
XCOXXxooooo 
OXXXOxoooo 
X Xx*xOooooooo 
XXOxoocoooo 


This matrix has upper and lower bandwidth equal to three and so a very 
reasonable solution procedure results by applying band Gaussian elimina- 
tion to this permuted version of A. 

The subject of bandwidth-reducing permutations is important. See 
George and Liu (1981, Chapter 4). We also refer to the reader to Varah 
(1972) and George (1974) for further details concerning the solution of block 
tridiagonal systems. 


4.5.4 Block Cyclic Reduction 


We next describe the method of block cyclic reduction that can be used 
to solve some important special instances of the block tridiagonal system 
(4.5.1). For simplicity, we assume that A has the form 


DF e 0 
F D : 
Azone, e gRu*« (4.5.11) 
: . . F 
O vs F D 


where F and D are q-by-q matrices that satisfy DF = FD. We also assume 
that n = 2* — 1. These conditions hold in certain important applications 
such as the discretisation of Poisson's equation on a rectangle. In that 
situation, 


D= Tete te, , (4.5.12) 
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and F = —J,. The integer n is determined by the size of the mesh and can 
often be chosen to be of the form n = 2* — 1. (Sweet (1977) shows how to 
proceed when the dimension is not of this form.) 

The basic idea behind cyclic reduction is to halve the dimension of the 
prohlem on hand repeatedly until we are left with a single q-by-q system 
for the unknown subvector zj.-:1. This system is then solved by standard 
means. The previously eliminated z; are found by a back-substitution 
process. 

The general procedure is adequately motivated by considering the case 
n=T7: 

b 

b 

bs 

b. 

bs 


Dz; + Fr 
Fr, + Dra + Fn 
Fr, + Dr; + Fr 
Fr} + Dz + Frs 
Fry + Dzs + Fre 
bs Fr, + Drs + Fr; 
by Fre + Dry 
(4.5.13) 
For i = 2, 4, and 6 we multiply equations i — 1, i, and i +1 by F, —D, and 
F, respectively, and add the resulting equations to obtain 
(2F? ~ D?)z2 + F?z, = F(b, + 63) — Db; 
Fez, + (2F? — D?)zq + F2zq = F(bz + bs) — Db, 
Fer, + (2F?-D?)zg = F (bs + by) — Dbe 
Thus, with this tactic we have removed the odd-indexed z; and are left 
with a reduced block tridiagonal system of the form 


DOn + Füz a? 

Füz, + Dey + Füz = WD 

Fn + Dg = bP 
where DC?) = 2F? — D? and F() = F? commute. Applying the same elim- 
ination strategy as above, we multiply these three equations respectively 


by FO), —D(), and FO), When these transformed equations are added 
together, we obtain the single equation 


(atro? - pa) za = FU (ag +04) - pow? 
which we write as 


Dz zb. 


This completes the cycllc reduction. We now solve this (small) q-by-q sys- 
tem for x4. The vactors z; and zs are then found by solving the systems 


Dn = a) — FO, 
Dz. = a? — FO, 
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Finally, we use the first, third, fifth, and seventh equations in (4.5.13) to 
compute 21, Z3, Z6, and zr, respectively. 
For general n of the form n = 2* —1 we set D?) = D, FO) = p, 90) =b 
and compute: 
for pzik-1 
PO) [Fo-np 
D) = 2F) — [pG-np 


rs QP 
for j = 1:24-? —1 (4.5.14) 
- - 1 (p-1 - -1 
DD = Fo-9 (BPD, aED) - Dec min? 
end 
end 
The z; are then computed as follows: 
Solve D*-Uz,, i = bE for zon. 
forpzk-—2:—1:) 
r=? 
for j = 1:2-7- (4.5.15) 
ifj-l 


=p 
C= bes aye 7 FP Tar 


elseif j = 2*-P+1 


= 10) 
c= Qa = FO) Taar 
else 
cm FO (zajr + 205-3) 
end 


Solve D9)zo; je = € OT Tiaj- ne 
end 
end 


The amount of work required to perform these recursions depends greatly 
upon the sparsity of the D® and F), In the worse case when these 
matrices are full, the overall flop count has order log(n)g?. Care must be 
exercised in order to ensure stability during the reduction. For further 
details, see Buneman (1969). 


Example 4.5.1 Suppose q = 1, D = (4), and F = (-1) in (4.5.14) and that we wish to 
solve: 


4-1 0 0 0 0 0 t 2 
-1 4 -1 0 0 0 0 22 4 
0-1 4-1 0 0 0 z3 6 
0 0-1 4-1 0 0 zu | =| 8 
0 0 0-1 4-1 0 zs 10 
0 0 0 0-1 4-1 ze 12 
o 0 0 0 0-1 4 zr z 
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By executing (4.5.15) we obtain the reduced systems: 
-14 1 D z1 -uU 
1 -l4 1 za | = | -48 p-1 
0 1 -14 te -80 


[29] = [u][-me] p=? 
The z; are than determined via (4.5.16): 


and 


p=2: 2=4 
post 222 ze =6 
p= n=l z3=3 tg =6 zy=7 


Cyclic reduction is an example of a divide and conquer algorithm. Other 
divide and conquer procedures are discussed in §1.3.8 and §8.6. 


4.5.5 Kronecker Product Systems 
If B € R"*" and C € RP**, then their Kronecker product is given by 


bn bya 6 bml 

bal bal oe baal 
A=BeC=] ^ 5 . 5 
bmi bm e bmnE 


Thus, A is an m-by-n block matrix whose (i, j) block is b;C. Kronecker 
products arise in conjunction with various mesh discretisations and through- 
out signal processing. Some of the more important properties that the 
Kronecker product satisfies include 


(A@B\(C@D) = =ACQBD (4.5.16) 
(A9B) = AT@BT (4.5.17) 
(48B)? = A` eB“! (4.5.18) 


where it is assumed that all the factor operations are defined. 
Related to the Kronecker product is the “vec” operation. 
X(:,1) 
Xem e vec( X) = : eR™. 
X(:,n) 


Thus, the vec of a matrix amounts to a “stacking” of its columns. It can 
be shown that 


Y-CXB" e  vec(Y) =(B@C)vec(X). (4.5.19) 
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It follows that solving a Kronecker product system, 
(B&C)z zd 


is equivalent to solving the matrix equation CX BT. = D for X where 
a = vec(X) and d = vec(D). This has efficiency ramifications. To illustrate, 
suppose B,C € R"*" are symmetric positive definite. If A= BC is 
treated as a general matrix and factored in order to solve for x, then O(n®) 
flops are required since B & C e IR? **^, On the other hand, the solution 
approach 


1. Compute the Cholesky factorizations B = GGT and C = HHT. 
2. Solve BZ = DT for Z using G. 
3. Solve CX = ZT for X using H. 
4. z = vec(X). 
involves O(n?) flops. Note that 
B@C=GG" 9 HH" «(Ge H)(G 8 H)T 


is the Cholesky factorization of B & C because the Kronecker product of a 
pair of lower triangular matrices is lower triangular. Thus, the above four- 
step solution approach is a structure-exploiting, Cholesky method applied 
to Bac. 

We mention that if B is sparse, then B&C has the same sparsity at the 
block level. For example, if B is tridiagonal, then B &C is block tridiagonal. 


Problems 


P4.6.1 Show that a block diagonally dominant matrix is nonsingular. 
P4.5.2 Verify that (4.5.6) implies (4.5.7) and (4.5.8). 


P4.5.3 Suppose block cyclic reduction is applied with D given by (4.5.12) and F-c-h. 
What can you say about the band structure of the matrices F(?) and > that arise? 


P4.5.4 Suppose A € R’** is nonsingular and that we have solutions to the linear 
systems Az = b and Ay = g where b, g € KU are given. Show how to solve the system 


[é aJil- [5] 
in O(n) flops where a, 4 € R and h € R” are given and the matrix of coefficients A4. is 


nonsingular, The advisability of going for such & quick solution is a complicated issue 
that depends upon the condition numbers of A and A4. and other factors. 

P4.5.5 Verify (4.5.16)-(4.5.19). 

P4.5.7 Show how to construct the SVD of B @ C from the SVDs of B and C. 
P4,5.8 If A, B, and C are matrices, then it can be shown that (A8B)8C = A@(B@C) 
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and so we just write A@ B G C for this matrix. Show how to solve the linear system 
(AG B G C)z = d assuming that A, B, and C are symmetric positive definite. 


Notes and References for Sec. 4.5 


The following papers provide insight into tbe various nuances of hlock matrix computa- 
tions: 


J.M. Varah (1972). "On tbe Solution of Block-Tridiagonal Systems Arising from Certain 
Finite-Difference Equations,” Math. Comp. 26, 859-68. 

J.A. George (1974). "On Block Elimination for Sparse Linear Systems," SIAM J. Num. 
Anal 11, 585-603. 

R. Fourer (1984). "Staircase Matrices and Systems," SIAM Review 26, 1-71. 

M.L, Merriam (1965), "Oo the Factorization of Block Tridiagonals With Storage Con- 
straints,” SIAM J. Sci. and Stat. Comp. 6, 182-192. 


The property of block diagonal dominance and its various implications is the central 
theme in 


D.G. Feingold and R.S. Varga (1962). “Block Diagonally Dominant Matrices and Gen- 
eralizations of the Gershgorin Circle Theorem," Pacific J. Math. 12, 1241-50. 


Early methods that involve the idea of cyclic reduction are described in 


R.W. Hockney (1965). "A Fast Direct Solution of Poisson's Equation Using Fourier 
Analysis, " J. ACM 12, 95-113. 

B.L. Busbee, G.H. Goluh, and C.W. Nielson (1970). "On Direct Methods for Solving 
Poisson's Equations,” SIAM J. Num. Anal. 7, 627-56. 


The accumulation of the right-hand eide must be done with great care, for otherwise 
there would be a significant lom of accuracy. A stable way of doing this is described in 


O. Buneman (1968). “A Compact Non-Iterative Poisson Solver,” Report 294, Stanford 
University Institute for Plasma Research, Stanford, California. 


Other literature concerned with cyclic reduction includes 


F.W. Dorr (1970). "Tbe Direct Solution of the Discrete Poisson Equation on a Rectan- 
gie,” SIAM Review 12, 248-63. 

B.L. Busbee, F.W. Dorr, J.A. George, and G.H. Golub (1971). “The Direct Solution of 
the Discrete Poisson Equation on Irregular Regiona,” SIAM J. Num. Anal. 8, 722-36. 

F.W. Dorr (1973). "The Direct Solution of tbe Discrete Poison Equation in O(n?) 
Operations,” SIAM Review 15, 412-415. 

P. Concus and G.H. Golub (1973). “Uas of Fast Direct Methods for tbe Efficient Nu- 
merical Solution of Nonseparable Elliptic Equations,” SIAM J. Num. Anal. 10, 
1103-20. 

B.L. Busbee and F.W. Dorr (1974). “The Direct Solution of the Biharmonic Equation 
on Rectangular Regions and the Poisson Equation on irregular Regions,” SIAM J. 
Num. Anal. 11, 753-63. 

D. Heller (1976). “Some Aspects of the Cyclic Reduction Algorithm for Block Tridiagonal 
Linear Systems," SAM J. Num. Anal. 13, 484-96. 


Various generalizations and extensions to cyclic reduction have been proposed: 


P.N. Swarztrauber and R.A. Sweet (1973). "The Direct Solution of the Discrete Poisson 
Equation on a Disk,” SIAM J. Num. Anal. 10, 900-907. 
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FLA. Sweet (1974). “A Generalized Cyclic Reduction Algorithm,” SIAM J. Num. Anal. 
11, 506-20. 

M.A. Diamond and D.L.V. Ferreira (1976). "On a Cyclic Reduction Methud for the 
Solution of Poisson's Equation,” SIAM J. Num. Anal. 13, 54-70, 

R.A, Sweet (1977), “A Cyclic Reduction Algorithm for Solving Block Tridiagonal Sys- 
tems of Arbitrary Dimension,” SIAM J, Num. Anal. 14, 706-20. 

P.N. Swarztrauber and R. Sweet (1989). “Vector and Parallel Methods for the Direct 
Solution of Poisson’s Equation,” J. Comp. Appl. Math. 27, 241—283. 

S. Bondeli and W, Gander (1994). “Cyclic Reduction for Special 'Tridiagonal Systems,” 
SIAM J. Matriz Anal. Appi. 15, 321—330. 


For certain matrices that arise in conjunction with elliptic partial differential squations, 
block eliminstion corresponds to ratber natural operations on the uuderlying mesh. A 
classical example of this is the method of nested dissection described in 


A. George (1973). “Nested Dissection of a Regular Finite Element Mesh,” SIAM J. 
Num. Anal. 10, 345-63. 


We also mention the following general aurvey: 


J.R. Bunch (1976). “Block Methods for Solving Sparse Linear Systems," in Sparse 
Motriz Computations, J.R. Bunch and D.J. Rose (eds), Academic Press, New York. 


Bordered linear systems as presented in P4,5.4 are discussed in 


W. Govaerta and J.D, Pryce (1990). “Block Elimination with One Iterative Refinement 
Solves Bordered Linear Systems Accurately,” BIT 30, 490-507. 

W. Govaerta (1991). “Stable Solvers and Block Elimination for Bordered Systems," 
SIAM J. Matriz Anal. Appl. 12, 469—483. 

W. Govaerts and J.D. Pryce (1993). “Mixed Block Elimination for Linear Systems with 
Wider Borders,” IMA J. Num. Anal. 13, 161-180. 


Kronecker product references include 


H.C. Andrews and J, Kane (1970). “Kronecker Matrices, Computer Implementation, 
and Generalized Spectra,” J. Assoc. Comput. Mach. 17, 260-268. 

C. de Boor (1979). “Efficient Computer Manipulation of Tensor Producta," ACM Trans, 
Math. Soft. 5, 173-182. 

A. Graham (I98I). Kronecker Products and Matris Calculus with Applications, Ellis 
Horwood Ltd., Chichester, England, 

H.V. Henderson and S.R. Searle (1981). "The Vec-Permutation Matrix, The Vec Opera- 
tor, and Kronecker Products: A Review,” Linear ond Multilinear Algebra 9, 271-288, 

P.A, Regalia and S. Mitra (1989). “Kronecker Producta, Unitary Matrices, and Signal 
Processing Applications,” SIAM Review 31, 586-613, 


4.6 Vandermonde Systems and the FFT 


Suppose z(0:n) € IR"*!, A matrix V € pí^*Ux(n*? of the form 


V = Ví(zo,...,24) = 2. . 
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is said to be a Vandermonde matriz. In this section, we show how the 
systems VTa = f = f(0:n) and Vz = b = b(0:n) can be solved in O(n?) 
flops. The discrete Fourier transform is briefly introduced. This special and 
extremely important Vandermonde system has a a recursive block structure 
and can be solved in O(nlogn) flops. In this section, vectors and matrices 
are subscripted from 0. 


4.6.1] Polynomial Interpolation: V'a = f 


Vandermonde systems arise in many approximation and interpolation prob- 
lems. Indeed, the key to obtaining a fast Vandermonde solver is to recognize 
that solving VTa = f is equivalent to polynomial interpolation. This fol- 
lows because if V7a = f and 


p(z) = $ ajz (4.6.1) 
j-0 


then p(z:) = f; for i = Om. 

Recall that if the z; are distinct then there is a unique polynomial of 
degree n that interpolates (zo, fo),..-,(Zn,/n)- Consequently, V is non- 
singular as long as the z; are distinct. We assume this throughout the 
section, 

The first step in computing the a; of (4.6.1) is to calculate the Newton 
representation of the interpolating polynomial p: 


n k-1 
pz) = oo (Ie - z) : (4.6.2) 


km0 im 


The constants cy are divided differences and may be determined as follows: 


c(0:n) = 7 (0:n) 
for k = 0:n -1 
fori=n:—1:k+1 (4.6.3) 
& 7 (& — ei-1)/(zi — Zi-x-1) 
end 
end 


See Conte and de Boor (1980, chapter 2). 
The next task is to generate a(0:n) from c(0:n). Define the polynomials 
Pn(Z),---,Po(z) by the iteration 
Pn (2) = Cp 
for k =n -1:-1:0 


pr(z) = cx + (z — zx)Peei(z) 
end 
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and observe that po(r) = p(z). Writing 


Plz) = oa) + az tek az" 


and equating like powers of z in the equation py = cy + (z — Tk)Pk+1 gives 
the following recursion for the coefficients af : 


ad= 
= Cy 
for k=n—-1:-1:0 
«P molt? 


end 
a = gt 


end 


Consequently, the coefficients a; = at? can be calculated as follows: 
a(0:n) = c(O:n) 
for k=n—-1:-10 
for i=kn-1 (4.6.4) 
Gi =O; — Fp0i41 
end 
end 


Combining this iteration with (4.6.3) renders the following algorithm: 


Algorithm 4.6.1 Given x(0:n) € R+} with distinct entries and f = 
J(0:n) € IR^*!, the following algorithm overwrites f with the solution a = 
a(0:n) to the Vandermonde system V(zo,...,za)Ta = J. 
for k -0:n —1 
for §=n:- 1:k4+1 
FO = (F — fG—1)/(() - z(i - k - 1) 


end 
end 
fork2n—1:—1:0 
fori-kn-1 
FG) = FG) - FG + Yk) 
end 
end 


This algorithm requires 5n?/2 flops. 


Example 4.6.1 Suppose Algorithm 4.6.1 is used to solve 


11 101 17 fæ 10 
|i | HEBEH 
13 9 2 az 58 |” 
1 4 16 64 as 112 
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The first k-loop computes the Newton representation of p(x): 
p(x) = 104 16(x — 1) + 8(z— D(z — 2) + (2 - D(z -2(z- 3). 
The second k-loop computes a = [4 3 2 1]7 from (I0 16 S 1]7. 


4.6.20 The System Vz = b 


Now consider the system Vz = b. To derive an efficient algorithm for this 
problem, we describe what Algorithm 4.6.1 does in matrix-vector language. 
Define the lower bidiagonal matrix Ly(a) c ROOt)*@+) py 


L(a) = 


and the diagonal matrix Dy by 


Dy = diag(1,...,1 ,Zk 41 — Z0,- --, Za — Xa-x-1)- 
kl 


With these definitions it is easy to verify from (4.6.3) that if f = /(0:n) 
and c = c(0:n) is the vector of divided differences then c = UT f where U 
is the upper triangular matrix defined by 

UT = Di! La (1)--- Dg Es(1). 
Similarly, from (4.6.4) we have 

a — L'e, 

where L is the unit lower triangular matrix defined by: 

LT = Lolo)" --- Lai (22-1)7. 
Thus, a = LTUT f where V-T = LTUT. In other words, Algorithm 4.6.1 
solves VTa = f by tacitiy computing the “UL” factorization of V~!. 

Consequently, the solution to the system Vz = b is given by 


z = V^lb = U(Lb) 
= (Lo(1)7 Dg! --- Ln~1(1)7 Dz1,) (La i(za-1)--- Lo(z0)b) 
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This observation gives rise to the following algorithm: 


Algorithm 4.6.2 Given z(0:n) € R°+! with distinct entries and b = 
b(0:n) € IR"*!, the following algorithm overwrites b with the solution z = 
z(0:n) to the Vandermonde system V(z0,---.2n)z = b. 

for k=0n~1 


for i =n: -1:k +1 
b(i) = b(i) — z(k)b(i — 1) 
end 


end 
for k=n—-1:-10 
fori=k+1:n 
(i) = W/E) -zli - k- 1) 
end 
fori-kn-1 
(5) = ii) -bli +1) 
end 
end 


This algorithm requires 5n?/2 flops. 


Example 4.6.2 Suppose Aigorithm 4.6,2 is used to solve 


I1 I I z 0 
I2 3 4 n|.|-I 
1 9 16 mi 3]: 
I 2T 64 E 35 


4 
8 
The first k-loop computes tbe vector 


0 0 
LADLAD) | i = ll 
35 6 


The second k-loop then calculates 


0 3 
Lo(1) Dg h (1)? D; Lal) Dz? | a | = | 14 | . 
35 1 


4.6.3 Stability 


Algorithms 4.6.1 and 4.6.2 are discussed and analyzed in Bjórck and Pereyra 
(1970). Their experience is that these algorithms frequeutiy produce sur- 
prisingly accurate solutions, even when V is ill-conditioned. They also 
show how to update the solution when a new coordinate pair (Zn41, /A41) 
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is added to the set of points to be interpolated, and how to solve confluent 
Vandermonde systems, i.e., systems involving matrices like 


- 


3,4, 
HS o 


V = V(zoznznzs) = | 78 


RRL. 
AS. 


4.6.4 The Fast Fourier Transform 
The discrete Fourier transform (DFT) matrix of order n is defined by 


F.-(u) facem 


where 
wy = exp( —2mi/n) = cos(2m/n) — i - sin(2m/n). 


The parameter wn is an nth root of unity because ug = 1. In then = 4 
case, wą = —1 and 


11 1 1 1 12 1 1 

_ | 1 we w2 oF | _ |1 -i -1 i 
F= 192? éd oP | |1-1 1-1 
1 we wh of 1 id =l ~i 


If z € €^, then its DFT is the vector Faz. The DFT has an extremely 
important role to play throughout applied mathematics and engineering. 

If n is highly composite, then it is possible to carry out the DFT in 
many fewer than the O(n?) flops required by conventional matrix-vector 
multiplication. To illustrate this we set n = 2f and proceed to develop 
the rudix-2 fast Fourier transform (FFT). The starting point is to look 
at an even-order DFT matrix when we permute its columns so that the 
even-indexed columns come first. Consider the case n — 8. Noting that 
wh? = wki med 8 we have 


1.1 10 1 1 1 «21 1 
l w wt WF of oF of wv 
lw wf oF 1 o of of 
R= low w w wt a? w? of 8 
I| i af 1 af 2 wt 1 aj’ v= 
lo w5 wow wt w of wo? 
1 of of w? 1 a9 wt w? 
low' of w5 w* ws)! w v 
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If we define the index vector c = [02 46 1 35 7], then 


1 1 1 1 1 1 1 1 
low wf of w w? w w 
1 1 wtf w b w yb 
1 wt wl ww ww wi oh 
Pls) = | oad 
1 w^ of wf w — — — 
1 wf 1 oft] —w? —9 —? —5 
1 wô wt wl -w -w -— wh 


The lines through the matrix are there to help us think of F,(:,c) as a 
2-by-2 matrix with 4-by-4 blocks. Noting that w? = ug = wą we see that 


aL ILE | OR 
Fane) = [te | 


where 
10 0 0 
_|0 we 0 
$- 9 o we 0 
00 0 & 


It follows that if z in an 8-vector, then 
Fa | Fe | [_2(0:2:7) 
A | —O4F, | [2027 
Na | [_Fax(0:2:7) © 
-Qa Fuz(1:2:7) 2:7) 


Thus, by simple scalings we can obtain the 8-point DFT y = Faz from the 
4-point DFTs yp = Fyz(0:2:7) and yg = Fyz(1:2:7): 


Fas = F(:,c)z(c) 


y(0:4) = yr+d.+yg 
y(4:7) = yr-d.*yn 
Here, 
1 
d=| aà 
w? 


and *«" indicates vector multiplication. In general, if n = 2m, then y = 
Faz is given by 


y(O:m — 1) yr t d.*yn 
ymn-1) = ymn—d.»yn 
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where 
d = [1, wy umi]? 
yr = F,,2(0:2:n-1), 
ve = — FaOr(b2m-1). 


For n = 2* we can recur on this process until n = 1 for which Fiz = z: 


function y = FFT(z,n) 
ifn=1 
yor 
else 
m = n/2; w = enin 
yr = FFT (2(0:2:n),m); yx = FFT(z(1:::n), m) 
d= [1, weer, gmt lV; z-d.*yg 


end 


This is a member of the fast Fourier transform family of algorithms. It 
has & nonrecursive implementation that is best presented in terms of a 
factorization of F4. Indeed, it can be shown that Fa = At--- AiP, where 


Ay =1,@B, L-25,r-n/L 
with 


[dua fua _ a4 L/a-1 
B, = | Tua -Nra and [737^ = diag(1,wz,..-,wz ) 
The matrix P, is called the bit reversal permutation, the description of 
which we omit. (Recall the definition of the Kronecker product ^9" from 
84.5.5.) Note that with this factorization, y = Faz can be computed as 
follows: 


z= PaT 

for q = 1:t 
L=X2,r=n/L (4.6.5) 
z= (I 8 Bri) 

end 


The matrices Ay = (J, ® Bz) have 2 nonzeros per row and it is this sparsity 
that makes it possible to implement the DFT in O(nlogn) flops. In fact, 
a careful implementation involves 5n log, n flops. 

The DFT matrix has the property that 


1 1 
Fy’ = SFr = Fa (4.6.6) 
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That is, the inverse of F, is obtained by conjugating its entries and scaling 
by n. A fast inverse DFT can be obtained from a (forward) FFT merely 
by replacing all root-of-unity references with their complex conjugate and 
scaling by n at the end. 

The value of the DFT is that many “hard problems” are made simple 
by transforming into Fourier space (via F4). The sought-after solution 
is then obtained by transforming the Fourier space solution into original 
coordinates (via F771). 


Problems 
P4.6.1 Show that if V = V(xo,...,z4), then 
de(V) = II (zi — 2). 


n26»j20 
P4.6.2 (Gautachi 1975a) Verify the following inequality for the n = 1 case above: 


1+ [aid 
Okan lex — zil 
iwo 
igh 
Equality results if the z; are all on the same ray in the complex plane. 


NV? Hoo S 


P4.6.3 Suppose w = f wn, Ws- on | where n = 2'. Using colon notation, 
express 

fa, wr. wl... up 7 ‘| 
as a subvector of w where r = 29, q = Lt. 
P4.6.4 Prove (4.6.6). 
P4.6.5 Expand the operation z = (1 @ Br)x in (46.5) into a double loop and count 
the number of flops required by your implementation. (Ignore the details of z = Paz- 
P4.6.6 Suppose n = 3m and examine 


G = [Fa(50Xn-1) Fa(ixmn - 1) Fa(. 2:3 — 1)] 


as a 3-by-3 block matrix, looking for scaled copies of Fin. Based on what you find, 
develop a recursive radix-3 FFT analogous to tbe radix-2 implementation in tbe text. 


Notes and References for Sec. 4.6 
Our discussion of Vandermonde linear systems is drawn from tbe papers 


A. Bjorck and V. Pereyra (1970). "Solution of Vandermonde Systems of Equations," Math. 
Comp. 24, 893-903. . 

A. Bjorck and T. Elfving (1973). "Algorithms for Confluent Vandermonde Systems,” 
Numer. Math. 2), 130-37. 


The divided difference computations we discussed are detailed in chapter 2 of 


S.D. Conte and C. de Boor (1980). Elementary Numerical Analysis: An Algorithmic 
Approach, 3rd ed., McGraw-Hill, New York. 
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The latter reference includes an Algo! procedure. Error analyses of Vandermonde system 
solvers include 


N.J. Higham (1987b). "Error Analysis of the Bjorck-Pereyra Algorithms for Solving 
Vandermonde Systems,” Numer. Math. 50, 613-632. 

N.J. Higham (1988). “Fast Solution of Vandermonde-like Systems Involving Orthogonal 
Polynomials,” 1MA J. Num. Anal. 8, 473-486. 

N.J. Higham (1990). “Stability Analysis of Algorithms for Solving Confluent Vandermonde- 
like Systems," SIAM J. Matriz Anal. Appl. 1), 23-41. 

S.G. Bartels and D.J. Higham (1992). “The Structured Sensitivity of Vandermonde-Like 
Systems,” Numer. Math. 62, 17-34. 

J.M. Varah (1993). "Errors and Perturbations in Vandermonde Systems," IMA J. Num. 
Anal }3, 1-12. 


Interesting theoretical results concerning the condition of Vandermonde systems may be 
found in 


W. Gautschi (19758). “Norm Estimates for Inverses of Vandermonde Matrices,” Numer. 
Math. 23, 337-47. 

W. Gantachi (1975b). “Optimally Conditioned Vandermonde Matrices,” Numer. Math. 
24, 1-12. 


The basic algorithms presented can be extended to cover confluent Vandermonde sys- 
terns, block Vandermonde systems, and Vandermonde systems that are based on other 
polynomial basea: 


G. Galimberti and V. Pereyra (1970). “Numerical Differentiation and the Solution of 
Multidimensional Vandermonde Systems," Math. Comp. 24, 357-64. 

G. Galimberti and V. Pereyra (1971). “Solving Confluent Vandermonde Systems of 
Hermitian Type,” Numer. Math. 18, 44-60. 

H. Van de Vel (1977). "Numerical Trestment of a Generalized Vandermonde systems of 
Equations,” Lin. Aly. and Its Applic. 17, 149-74. 

G.H. Golub and W.P Tang (1981). “The Block Decomposition of a Vandermonde Matrix 
and Its Applications,” BIT 2), 505-17. 

D. Calvetti and L. Reichel (1992). “A Chebychev-Vandermonde Solver,” Lin. Alg. and 
Its Applic. 172, 219-229. 

D. Calvetti and L. Reichel (1993). "Fast Inversion of Vandermonde-Like Matrices In- 
volving Orthogonal Polynomials,” BIT 33, 473-484. 

H. Lu (1994). “Fast Solution of Confluent Vandermonde Linear Systems,” SIAM J. 
Matriz Anal. Appi. 15, 1277-1289. 

H. Lu (1996). "Solution of Vandermonde-like Systems and Confluent Vandermonde-like 
Systems,” SIAM J. Matriz Anal. Appl. 17, 127-138. 


The FFT literature is very extensive and acattered. For an overview of tbe area couched 
in Kronecker product notation, see 


C.F. Van Loan (1992). Computational Frameworks for the Fast Fourier Transform, 
SIAM Publications, Philadelphia, PA. 


The point of view in this text is that different FFTs correspond to different factorizations 
of the DFT matrix. These are sparse factorizations in that the factors have very few 
nouzeros per row. 
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4.7 Toeplitz and Related Systems 


Matrices whose entries are constant along each diagonal arise in many ap- 
plications and are called Toeplitz matrices. Formally, T € R"*" is Toeplitz 
if there exist scalars r5 41,..., T0, ...,Tn-1 Such that ayy = rj; for all i 
and j. Thus, 


To n n n 3176 

T= ri o "i n 4317 
«rara ro n| {043 1 
rg Y-2 fi o 904 3 


is Toeplitz. 

Toeplitz matrices belong to the larger class of persymmetric matrices. 
We say that B € R"** is persymmetric if it symmetric about its northeast- 
southwest diagonal, i.e., b;; = Us; 1,5 i41 for alli and j. This is eqnivalent 
to requiring B = EBTE where E = [én,...,¢1] = In(:,n:— 1:1) is the 
n-by-n exchange matriz, ie., 


0001 
0010 
E-ilo1020 
1000 


It is easy to verify that (a) Toeplitz matrices are persymmetric and (b) the 
inverse of a nonsingular Toeplitz matrix is persymmetric. In this section we 
show how the careful exploitation of (b) enables us to solve Toeplitz systems 
with O(n?) flops. The discussion focuses on the important case when T is 
also symmetric and positive definite. Unsymmetric Toeplitz systems and 
connections with circulant matrices and the discrete Fourier transform are 
briefly discussed. 


4.7.1 Three Problems 
Assume that we have scalars ri,...,ra such that for k = 1:n the matrices 


1 is c0 Tk-2 fi 
Ti 1 Tk—2 
T, = e 
Tk-2 tno 5n n 
Tki fk-2 c n 1 


are positive definite. (There is no loss of generality in normalizing the 
diagonal.) Three algorithms are described in this section: 
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e Durbin's algorithm forthe Yule- Walker problem Tay = —[ri,-..,tnl™- 
* Levinson's algorithm for the general righthand side problem Taz = b. 
è Trench’s algorithm for computing B = T=}. 
In deriving these methods, we denote the k-by-k exchange matrix by E,, 
ie, Ey = I, ( kt — 1:1). 


4.7.2 Solving the Yule-Walker Equations 


We begin by presenting Durbin's algoríthm for the Yule-Walker equations 
which arise in conjunction with certain linear prediction problems. Suppose 
for some k that setisfies 1 < k < n — 1 we have solved the k-th order Yule- 
Walker system Tyy = —r = —(ri,...,rk)!. We now show how the (k+1)-st 
order Yule-Walker system 


T Er z|. r 
TE, 1 a]  $|rxa 
can be solved in O(k) flops. First observe that 
zzT,'(-r-aE,r)-y- aT, Err 


and 
= T 
a= -ryp — r7 Ekz. 


Since Ty! is persymmetric, T, `E, = E,T, ! and thus, 
z-y - aE, T, 'r = y aEXy. 
By substituting this into the above expression for a we find 
a = -rkp — 1T Bay + Ey) = — (raga + T Eay)/(1 779). 


The denominator is positive because T,,,1 is positive definite and because 


I Eyy T T, Eyr I Ey]. [A 0 
0 1 TTE, 1 0 1 =| o l4rty 


We have illustrated the kth stap of an algorithm proposed by Durbin (1960). 
It proceeds by solving the Yule-Walker systems 


Ty = —r® = — [iun]? 


for k — 1:n as follows: 
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y = -r 

for k = l:n—1 
fk = 14 [r@yTy 
ay = (rg 109^ Ey099)/8, (4.7.1) 
z) = y ON 


(k) 
(k+) | 7 
wl] 
end 


As it stands, this algorithm would require 3n? flops to generate y = y). 
It is possible, however, to reduce the amount of work even further by ex- 
ploiting some of the above expressions: 
Be 14 [i 9]7,09 
k—-1) k-i 
14 [ pent y ] | gO) ay LE, y 677 | 


OR] 


(14 [r7 YAY) + ayy (PEDF Eryt- + rx) 


By-1 + ak-i(-pk-10x-1) 
a- og_,)Bx-1. 


Using this recursion we obtain the following algorithm: 


Algorithm 4.7.1. (Durbin) Given real numbers 1 = ro,r},...;7n such 
that T = (ry) € K^*" is positive definite, the following algorithm com- 
putes y € R” such that Ty = - (ri,...,ra)T. 
y(1)  -r(1; 8-1; a = —r(1) 
fork-Im-1 
B8-(—-a?)8 
a = — (r(k + 1) + r(k — 1:1)7 y(1:k)) /8 
z(l:k) = y(1:k) + ay(k: — 1:1) 
y(k41)- | (izk) ] 
end 
This algorithm requires 2n? flops. We have included an auxiliary vector z 
for clarity, but it can be avoided. 


Example 4.7.1 Suppose we wish to solve the Yule- Walker system 
1 5 2 n 5 
5 1 5 n |=-|.2 
2 5 1 y E 

using Algorithm 4.7.1. After one pass through the loop we obtain 


a=1/15,  8—3/, v- [0s | - 
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We then compute 


B = (1-07)p = 56/75 

a = —(rs +rayı tny/B- -1/28 
zo c yicay)»- 225/420 

z2 = ya ton = ~36/420, 


giving the final solution y = [-75, 12, —5]7 /140. 


4.7.3 The General Right Hand Side Problem 


With a littie extra work, it is possible to solve a symmetric positive definite 
Toeplitz system that has an arbitrary right-hand side. Suppose that we 
have solved the system 


Tkz = b= (by... bk)” (4.7.2) 


for some k satisfying 1 € k < n and that we now wish to solve 


| th, Eur | p | - | M | . (4.7.3) 


Here, r = (r1,... ,Tk)T as above. Assume also that the solution to the kth 
order Yule- Walker system T,y = —r is also available. From T&v + uE,T = b 
it follows that 


v — T, (b - Err) = £ — uL Ek = z + pEky 
and so 


B = degy ort Egy 
beg — rT Eyz - pry 
(bai — 7T Exs) / (1 ry). 


Consequently, we can effect the transition from (4.7.2) to (4.7.3) in O(k) 
flops. 

Overall, we can efficiently solve the system Taz = b by solving the sys- 
tems T,z(*) = bU? = (b1,..., bk)? and Ty? = —r(9 = (ri,..., 74)" “in 
parallel” for k = 1:n. This is the gist of the following algorithm: 


Algorithm 4.7.2 (Levinson) Given b c R* and real numbers 1 = 
T9,T1,...,7n such that T = (rii-j) €R"™* is positive definite, the fol- 
lowing algorithm computes z € R” such that Tz = b. 


y(1) = —r(1); z(1) = (1; 821; a = —r(1) 
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for k=1:n-1 
B= (1 = o9); w= (Kk + 1) — r(1:)7z(k: — 1:1) /8 
v(1:k) = z(l:k) + uy(k: — 1:1) 


a(I:k +1) = re ] 


ifk«n-1 
a = (—r(k + 1) + r(Lk)Ty(k: - 1:1)) /B 
2(1:k) = y(L:k) + ay(k: — 1:1) 


y (lik +1) = | «ce | 


end 
This algorithm requires 4n? flops. The vectors z and v are for clarity and 
can be avoided in a detalled implementation. 


Exemple 4.7.2 Suppose we wish to solve the symmetric positive definite Toeplitz 


1 6 3 zi 4 
5 1 5 z: |=-]| -1 
2 5 1 z3 3 
using the above algorithm. After one pass through the loop we obtain 
- _ _ f -8/15 - 6 
a= 1/15, B=3/4 y= 1/15 ] z=[ {]- 


We then compute 


8 


vi 


(1—27)8 =56/75 p = (ba — rixa — r221)/Ø = 285/56 
zı + aya = 355/56 v = z3 + wy, = —376/56 


giving the final solution z = {355, —376, 2857/56. 


4.7.4 Computing the Inverse 

One of the most surprising properties of a symmetric positive definite 
Toeplitz matrix Tn is that its complete inverse can be calculated in O(n?) 
flops. To derive the algorithm for doing this, partition Ty ! as follows 


A Er] B v 
-l — = 
T = E 1 ] - E | (4.7.4) 
where A = T, 4, E = En-1, and r = (ri,...,r4 1)7. From the equation 
A Er v| [9 
TTE 1 y] [1 
it follows that Av = —yEr = —JE(ri,...,ra-1) and y = 1 — rTEv. 
If y solves the (n — 1)-st order Yule- Walker system Ay = —r, then these 
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expressions imply that 
Y = Y(+rry) 
v = Ey. 


Thus, the last row and column of Ty! are readily obtained. 
It remains for us to develop working formulse for the entries of the 
submatrix B in (4.7.4). Since AB + Erv? = 1,.,, it follows that 


I 
B = A`} - (A Ero? = A71 + = . 


Now since A = T4.., is nonsingular and Toeplitz, its inverse is persymmet- 
ric. Thus, 


- vv; 
hy = (As ru 
= (AU asa ES (4.7.5) 
MEET p BY 
7 7 


1 
= ba-yn-i + y (es 7 Un-j?a-i) - 


This indicates that although B is not persymmetric, we can readily compute 
an element b;; from its reflection across the northeast-southwest axis. Cou- 
pling this with the fact that A^! is persymmetric enables us to determine 
B from its "edges" to its "interior." 

Because the order of operations is rather cumbersome to describe, we 
preview the formal specification of the algorithm pictorially. To this end, 
assume that we know the last column and row of Ty !: 


T = 


wreere 
wereeres 
wrweeere 
wereereee 
wreere 
ə ?D ?D PD 9C r 


Here u and k denote the unknown and the known entries respectively, and 
n — 6. Alternately exploiting the persymmetry of T, ! and the recursion 
(4.7.5), we can compute B, the leading (n — 1)-by-(n — 1) block of 7; !,, as 
follows: 


kk k k kk k k k k k k 
k uu u u k k uuukk 
persym. ku u u u k|(&59 fk u u u k k 
k u u u u k]|  |k u u u k k 
k uuu uk kk kkkk 
kk k k kk k k k k kk 


4.7. TOEPLITZ AND RELATED SYSTEMS 199 


k kk k kk k k k k kk 
k kk k kk k kk k kk 
peym| k k u u k klar | k kou k kk 
C Jk kuukk|l ^lkkkkkk 
k kk k kk k kk k kk 
k kkkkk k k k k k k 

k k kk kk 

kk kkkk 

persym.| k k k k k k 

U^ [kk kkER 

kk k k kk 

kk kkkk 


Of course, when computing a matrix that is both symmetric and persym- 
metric, such as T; ', it is only necessary to compute the “upper wedge" of 
the matrix—e.g., 


x 

x x (nx 6) 

x 

With this last observation, we are ready to present the overall algorithm. 


Algorithm 4.7.3 (Trench) Given real numbers 1 = rg,71,...,r4 such 
that T — (rua) € IR™° is positive definite, the following algorithm cam- 
putes B = Tz', Only those b; for which i € j and i+ j € n+ 1 are 
computed. 


Use Algorithm 4.7.1 to solve Ta iy = —(ri,. .., 75-1)7- 
4 2/4 r(En = 1)7g(1:n - 1)) 
v(1:n = 1) 2 (n - 1: 1:1) 
B(1,1)-7 
B(1,2:n) = v(n — 1: ~ 1:1)7 
for i = 2:floor((n — 1)/2) +1 
for j =tin~-t+1 
B(i,j) = BO) 1,9 -1)+ 
(v(n +1 — j)v(n-- 1—1) - v(i -—1)o§ - 1) /* 


end 
This algorithm requires 13n?/4 flops. 


Example 4.7.3 If the above algorithm is applied to compute the inverse B of the 
positive definite Toeplitz matrix 

1.5 2 

5 1 5j, 

2 5 1 
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then we obtain 7 = 75/56, ti = 75/56, ba = —5/T, ba = 5/56, and bra = 12/7. 


4.7.5 Stability Issues 


Error analyses for the above algorithms have been performed by Cybenko 
(1978), and we briefly report on some of his findings. 
The key quantities turn out to be the a, in (4.7.1). In exact arithmetic 
these scalars satisfy 
lax] <1 


and can be used to bound || T7! |,: 


———— , —-1 n~ 

"asa TT < prp e HH ars 
maxi I[a-o5 [[a-eo [S It'll s Urey (4.7.6) 

j=l j=l = 


Moreover, the solution to the Yule-Walker system Tay = —-r(:n) satisfies 


al, = (The +a) -1 (4.7.7) 


k=l 


provided all the a, are non-negative. 
Now if ż is the computed Durbin solution to the Yule- Walker equations 
then rp = Taż + r can be bounded as follows 


frolsu[Ia + (ax) 


kml 


where â, is the computed version of ay. By way of comparison, since 
each |r;} is bounded by unity, it follows that | rc || = ull y ||, where rc is 
the residual associated with the computed solution obtained via Cholesky. 
Note that the two residuals are of comparable magnitude provided (4.7.7) 
holds. Experimental evidence suggests that this is the case even if some of 
the o are negative. Similar comments apply to the numerical behavior of 
the Levinson algorithm. 

For the Trench method, the computed inverse D of T7? can be shown 
to satisfy 


-I B n " 
HT BU Lp Leld. 
ITUR AAT al 


In light of (4.7.7) we see that the right-hand side is an approximate upper 
bound for ujj Tg? || which is approximately the size of the relative error 
when T; ! is calculated using the Cholesky factorization. 
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4.7.6 'The Unsymmetric Case 


Similar recursions can be developed for the unsymmetric case. Suppose we 
are given scalars r1, ... ,ra-1, Pls ««- Paci; 8nd b1,...,64 and that we want 
to solve a linear system T'z — b of the form 


Ti T2 73 ra zi by 
P» l1 nr rra T2 b 
fom lon z3 | = | bs (n= 5) 
Pm lon Z4 ba 
PP mm 1 Zs bs 


In the process that follows we require the leading principle submatrices 
Tk = T(1:k,1:k), k = 1:n to be nonsingular. Using the same notation as 
above, it can shown that if we have the solutions to the k-by-k systems 


Ty = -r = -[n mi ss re)” 
Tj = -p = -|p p] (4.7.8) 
Tz = b = [by bp b], 


then we can obtain solutions to 
Tk Er T z = - T 
PES 1 aj ` Tk+1 
Tk Er u 
pE 1 v 


LA ID] n D] 

PE. 1 H bei 

in O(k) flops. This means that in principle it is possible to solve an unsym- 
metric Toeplitz system in O(n?) flops. However, the stability of the process 
cannot be assured unless the matrices Ty = T(1:k, 1:k) are sufficiently well 
conditioned. 


(4.7.9) 


4 
! 
——À 
3 
y% 
t 
— 


4.7.7 Circulant Systems 


A very important class of Toeplitz matrices are the circulent matrices. Here 
is an example: 


wW U4 Us U2 n 
Uu va Us vn 
Civ)= | va v ve u vs 
Us U2 Vi Yo u 
va Vs V2 Uj Uo 
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Notice that each column of a circulant is a “downshifted” version of its 
predecessor. In particular, if we define the downshift permutation 5, by 


00001 
10000 
$,2|01000 (n7 5) 
00100 
00010 


and v = [vo v1 va-1]^, then C(v) = [v, Sav, S2v,..., 571v]. 
There are important connections between circulant matrices, Toeplitz 
matrices, and the DFT. First of all, it can be shown that 
C(v) = Fy diag(Fnv)Fa- (4.7.10) 


This means that a product of the form y = C(v)z can be solved at “FFT 
speed" : 


i= Fat 
o= Fyv 
ZU 
yoFytz 


In other words, three DFTs and a vector multiply suffice to carry out the 
product of a circulant matrix and a vector. Products of this form are called 
convolutions and they are ubiquitous in signal processing and other areas. 

Toeplitz-vector products can also be computed fast. The key idea is 
that any Toeplitz matrix can be “embedded” in a circulant. For example, 


5 2 7 
T=/45 2 
9 4 5 


In general, if T = (t;;) is an n-by-n Toeplitz matrix, then T = C(1:n, 1:n) 
where C € R(#-)*("-)) is a circulant with 
T(1:n, 1) 


C1) = T(i,n: — 1:2)7 |? 


Note that if y = Cz and z(n--1:2n —1) = 0, then y(1:n) = Tz(1:n) showing 
that Toeplitz vector products can also be computed at “FFT speed.” 
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Prollems 


P4.7.1 For any v € RO define the vectors v} = (v + Env)/2and v_ = (v — Env)/2. 
Suppose A € K'**? is symmetric and persymmetric. Show that if Az = b then Az, = b+ 
and Az_ =b_. 

P4.T.2 Let U € RO*™ be the unit upper triangular matrix with the property that: 
U(Lk - 1k) = E, 4 y (7 where y is defined by (4.7.1). Show that 


UTT.U = disg(1,/1,.... Bn -1). 


P4.7.3 Suppose z € R?” and that S € K'*" is orthogonal. Show that if 
X= [z Sz, ss SP] 


then XT X is Toeplitz. 

P4,7.4 Consider the LDLT factorization of an n-by-n symmetric, tridiagonal, positive 
definite Toeplitz matrix. Show that dn and 4,4; converge as n — oo. 

P4.7.5 Show that the product of two lower triangular Toeplitz matrices is Toeplitz. 
P4.T.6 Give an algorithm for determining u € R such that 


Ta + u (enef + eiel) 


is singular. Assume Tn = (ry. 1) is positive definite, with ro = 1. 

P4.T.T Rewrite Algorithm 4.7.2 so that it does not require the vectors z and v. 
P4.7.8 Give an algorithm for computing xæ (Th) for k = 1:n. 

P4.7.9 Suppose Aj, Ag. As and A4 are m-by-m matrices and that 

4o Ai Aa As 

As 4o Ar Aa 

Az As A Ay 

A Az A Ao 

Show that there is a permutation matrix II such that IIT AN = C = (Cj) where eech 
G,, is a 4-by-4 circulant matrix. 

P4.T.10 A p-by-p block matrix A = (Aij) with m-by-m blocks is block Toeplitz if there 
exist Apti,- Å- A0, Aser Ag.) € R™*™ so that Aj, = Ái 5, e. 

Ao A ^A As 

An Ap A A 

Anz A-1 4o Aj 

A-3 Aa A. Ap 


(a) Show that there is a permutation II such that 


A= 


A= 


Tu Tao Tim 
nan a: | 7 7m 
Te oct Tmm 


where each Ti; is p-by-p and Toeplitz. Each T; should be “made up” of (i,j) entries 
selected from the A, matrices. (b) What can you say about tbe Ti; if Ay = A-k 
k=lp-1? 

.11 Show how to compute the solutions to the systems in (4.7.9) given that the 
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solutions to tbe systems im (4.7.5) are available. Assume that all the matrices involved 
are nonsingular. Proceed to develop a fast unsymmetric Toepiits solver for Tx = b 
asmuming that 7T"s leading principle submatrices are all nonsingular. 

P4.7.12. A matrix H € ROX" is Hankel if H(n: — 1:1,:) is Toeplitz. Show that if 
A € ROX" is defined by 


b 
oy = J ene 9) col j0)da 
then A is the sum of & Hankel matrix and Toeplitz matrix. Hint. Make use of the 
identity cos(u + v) = coe(u) cos(v) — sin(u) sin(v). 
P4,7.13 Verify that Fa C(v) = disg( Fav) Fn. 


P4.T.14 Show that it ix possible to embed a symmetric Toeplitz matrix into a symmetric 
circulant matrix. 


P4,T.15 Consider the kth order Yule-Walker system T,y(*) = -r(* that arises in 
(4.7.1): 
Yki Ti 
Th : =- 
Vkk Tk 
Show that if 
1 0 0 0 0 
y 1 0 0 0 
va vai 1 0 0 
L= Vas yn m 1 0j. 
Ja-in-1 jPa-hn-2 jVa-la-3 c Yna- | 


then LIL? = disg(1,/h,..., fai) where Pa = 1 + rTu, Thus, the Duriin 
Algorithm can be thought of as a fast method for computing the LDLT factorization of 


-l 
- 
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Hankel matrices are constant along their antidiagonals and arise in several important 

areas. 
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Systems of Equations,” Numer. Math. 58, 100-127. 
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$5.3 The Full Rank LS Problem 

§5.4 Other Orthogonal Factorizations 

$5.5 The Rank Deficient LS Problem 

§5.6 Weighting and Iterative Improvement 
85.7 Square and Underdetermined Systems 


This chapter is primarily concerned with the least squares solution of 
overdetermined systems of equations, i.e., the minimization of || Az — b ||, 
where A € R?*" with m > n and b € IR". The most reliable solution pro- 
cedures for this problem involve the reduction of A to various canonical 
forms via orthogonal transformations. Householder reflections and Givens 
rotations are central to this process and we begin the chapter with a discus- 
sion of these important transformations. In $5.2 we discuss the computation 
of the factorization A = QR where Q is orthogonal and R is upper trian- 
gular. This amounts to finding an orthonormal basis for ran(A). The QR. 
factorization can be used to solve the full rank least squares problem as we 
show in $5.3. The technique is compared with the method of normal equa- 
tions after a perturbation theory is developed. In 85.4 and 85.5 we consider 
methods for handling the difficult situation when A is rank deficient (or 
nearly so). QR. with column pivoting and the SVD are featured. In 85.6 we 
discuss several steps that can be taken to improve the quality of a computed 
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85.2 The QR Factorization 

$5.3 The Full Rank LS Problem 

§5.4 Other Orthogonal Factorizations 

$5.5 The Rank Deficient LS Problem 

§5.6 Weighting and Iterative Improvement 
85.7 Square and Underdetermined Systems 


This chapter is primarily concerned with the least squares solution of 
overdetermined systems of equations, i.e., the minimization of || Az — b ||, 
where A € R"*^ with m > n and b € R™. The most reliable solution pro- 
cedures for this problem involve the reduction of A to various canonical 
forms via orthogonal transformations. Householder reflections and Givens 
rotations are central to this process and we begin the chapter with a discus- 
sion of these important transformations. In $5.2 we discuss the computation 
of the factorization A = QR where Q is orthogonal and R is upper trian- 
gular. This amounts to finding an orthonormal basis for ran( A). The QR. 
factorization can be used to solve the full rank least squares problem as we 
show in $5.3. The technique is compared with the method of normal equa- 
tions after a perturbation theory is developed. In 85.4 and 85.5 we consider 
methods for handling the difficult situation when A is rank deficient (or 
nearly so). QR. with column pivoting and the SVD are featured. In 85.6 we 
discuss several steps that can be taken to improve the quality of a computed 
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least squares solution. Some remarks about square and underdetermined 
systems are offered in 85.7. 


Before You Begin 


Chapters 1, 2, and 3 and §§4.1-4.3 are assumed. Within this chapter 
there are the following dependencies: 


85.6 
T 
$51 — $52 — $53 — $54 — $55 
4 
85.7 


Complementary references include Lawson and Hanson (1974), Farebrother 
(1987), and Björck (1996). See also Stewart (1973), , Hager (1988), Stewart 
and Sun (1990), Watkins (1991), Gill, Murray, and Wright (1991), Higham 
(1996), Trefethen and Bau (1996), and Demmel (1996). Some MATLAB 
fuuctions important to this chapter are qr, svd, pinv, orth, rank, and the 
“backslash” operator “\.” LAPACK connections include 


LAPACK: Householder /Givens Tools 
Generates a Householder matrix 
Householder times matrix 

Small n Househoider times matrix 

Block Householder times matrix 

Computes I - V TV block reflector representation 
Generates a plane rotation 

Generates a vector of plane rotations 

Applies a vector of plane rotationa to a vector pair 
Applies rotation sequence to a matrix 

Real rotation times complex vector pair 

Complex rotation (c real) times complex vector pair 
Complex rotation (B real) times complex vector pair 


Q (factored form) times matrix (real case) 
Q (factored form) times matrix (complex case) 


pper triangular, 
A= = aL = = (orthogonal) (lower triangular) 
A= LQ = (lower trisngular)(orthogonal) 
A= RQ where A is upper trapezoidal 


Bidiagonalization of general matrix 
Generates the orthogonal transformations 
Bidlagonalization of band matrix 
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- F 
-GELSS | SVD solution to min | AX ~ B lle 
-GELSI | Complete orthogonal decomposition solution to min || AX — B fp. 
-GEEQU | Equilibrates general i conditi 


5.1 Householder and Givens Matrices 


Recall that Q € R'*" is orthogonal if QTQ = QQT = In. Orthogonal 
matrices have an important role to play in least squares and elgenvalue 
computations. In this section we introduce the key players in this game: 
Householder reflections and Givens rotations. 


5.1.1 A 2-by-2 Preview 
It is instructive to examine the geometry associated with rotations and 


reflections at the n = 2 level. A 2-by-2 orthogonal matrix Q is a rotation: 
if it has the form . 
q= | 0) siao) 
—sin(@) cos(@) ] 
If y = Q7 x, theu y is obtained by rotating x counterclockwise through an 
angle 8. 
A 2-by-2 orthogonal matrix Q is a reflection if it has the form 
q = [ (0) sini) 
sin(9) —cos(0) | 


If y = QTz = Qr, then y is obtained by reflecting the vector x across the 
line defined by 
= cos(0/2) 
$= span {| sin(9/2) . 
Reflections and rotations are computationally attractive because they are 
easily constructed and because they can be used to introduce zeros in a 
vector by properly choosing the rotation angle or the reflection plane. 


Example 5.1.1 Suppose r = [1, V3]. If we set 


Q= coœæ(—60°) sin(-609) 1/2 -V3/2 
—sin(-60°) coæl-60°) | ^ | Vay2 1/2 


then QT z = (2, 0]T. Thus, a rotation of —60° zeros the second component of r. If 


Q= coœæ(30°)  sia(30°) | _ | v3/2 1/2 
sin(30?) —cos(30°) | 7 1/2 —J3/2 


then QTz = (2, 0]T. Thus, by reflecting z across the 30° line we can zero its second 
component, 
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5.1.2 Householder Reflections 
Let v € R” be nonzero, An n-by-n matrix P of the form 


P=I- aw (5.1.1) 


is called a Householder reflection. (Synonyms: Householder matrix, House- 
holder transformation.) The vector v is called a Householder vector. If a 
vector z is multiplied by P, then it is reflected in the hyperplane span{v}+. 
It is easy to verify that Householder matrices are symmetric and orthogonal. 

Householder reflections are similar in two ways to Gauss transforma- 
tions, which we introduced in §3.2.1. They are rank-1 modifications of the 
identity and they can be used to zero selected components of a vector. In 
particular, suppose we are given 0 # z € IR" and want Pr to be a multiple 
of ei = Jn(:, 1). Note that 


and Pz € span(e; ) imply v € span(z, ei). Setting v = z + ae, gives 


vTz 2 zTz oxi 


and 
uly = zTz 4 2oz, +07, 


and therefore 


T T. 
z rtor vr 
21-2422 0709 0m ae. 
Pr ( zTz + 2az1 + =) z vry^ 


In order for the coefficient of z to be zero, we set a = +|| z ||2 for then 
vv? 
v-2ztl|zlae: > Pz- (: - 2m.) z= Fl z llaei- (5.1.2) 
It is this simple determination of v that makes the Householder reflection 
so useful, 
Example 5.1.2 If z = (3, 1, 5, 1]T and v (9, 1, 5, 1]7, then 
T -27 -9 -45 -9 
vv. 1 -9 $3 -5 -1 
-9 -1 -5 B 


has the property that Pz = {—6, 0, 0, 0, |7. 


210 CHAPTER 5. ORTHOGONALIZATION AND LEAST SQUARES 


5.1.3 Computing the Householder Vector 


There are a number of important practical details associated with the deter- 
mination of a Householder matrix, i.e., the determination of a Householder 
vector. One concerns the choice of sign in the definition of v in (5.1.2). 
Setting 
v= z -|z jl 

has the nice property that Pr is a positive multiple of e1. But this recipe is 
dangerous if z is close to a positive multiple of e; because severe cancellation 
would occur, However, the formula 


zizi tete) 
zi + |[z lla zilzlla 


suggested by Parlett (1971) does not suffer from this defect in the zı > 0 
case, 
In practice, it is handy to normalize the Householder vector so that 
v(1) = 1. This permits the storage of v(2:n) where the zeros have been 
introduced in z, i.e., z(2:n). We refer to v(2:n) as the essential part of 
the Householder vector. Recalling that 8 = 2/vT v and letting length(z) 
specify vector dimension, we obtain the following encapsulation: 


au -ic-lzlh- 


Algorithm 5.1.1 (Householder Vector) Given z € R”, this function 
computes v € IR” with v(1) = 1 and 8 € IR such that P = I, — Bov™ is 
orthogonal and Pz = || z ||;e1. 


function: [v, 6] = house(z) 
n = length(z) 
o = 2(2:n)7 z(2:n) 


v= 


z(2:n) 
ifo=0 
B20 
else 
p Vz(1-ce 
if z(1) «2-0 
»() =2(1) - a 
v(1) = -e/(z(1) + p) 
end 
8 = 2v(1)?/(o + v(1)?) 
v=v/v(1) 
end 


This algorithm involves about 3n flops and renders a computed Householder 
matrix that is orthogonal to machine precision, a concept discussed below. 
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A production version of Algorithm 5.1.1 may involve a preliminary scaling 
of the z vector (z —— z/|| z ||) to avoid overflow. 


5.1.4 Applying Householder Matrices 


It is critical to exploit structure when applying a Householder reflection to 
a matrix. If A € R™*" and P = I — pvu? € R™™, then 


PA= (I -§w") A= A- vu 
where w = AT v. Likewise, if P = I — pvu? € R™", then 
AP = A{I - pr) = A-wiT™ 


where w = Av. Thus, an m-by-n Householder update involves a matrix- 
vector multiplication and an outer product update. It requires 4mn flops. 
Failure to recognize this and to treat P as a general matrix increases work 
by an order of magnitude. Householder updates never entail the explicit 
formation of the Householder matriz. 

Both of the above Householder updates can be implemented in a way 
that exploits the fact that v(1) — 1. This feature can be important in the 
computation of PA when m is small and in the computation of AP when 
nis small. 

As an example of a Householder matrix update, suppose we want to 
overwrite A € R™*" (m > n) with B = QTA where Q is an orthogonal 
matrix chosen so that B(j + 1:m, j) = 0 for some j that satisfies 1 < J € n. 
In addition, suppose A(j:m,1:7 — 1) = 0 and that we want to store the 
essential part of the Householder vector in A(j + 1:m,j). The following 
instructions accomplish this task: 


{v, 6] = house(A(j:m, j)) 
A(j:m, j:n) = (Im-341 — vv?) Am, jin) 
A(j + 1:m,j) = v(2:m -j +1) 


From the computational point of view, we have applied an order m — j +1 
Householder matrix to the bottom m — j + 1 rows of A. However, mathe- 
matically we have also applied the m-by-m Householder matrix 


D Ii 0 n = 0 
e [s i] [n] 


to A in its entirety. Regardless, the "essential" part of the Householder 
vector can be recorded in the zeroed portion of A. 
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5.1.5  Roundoff Properties 


The roundoff properties associated with Householder matrices are very fa- 
vorable. Wilkinson (1965, pp. 152-62) shows that house produces a House- 
holder vector à very near the exact v. If P = I — 2907/07 0 then 


IÊ- Pla = O(u) 


meaning that Pis orthogonal to machine precision. Moreover, the com- 
puted updates with P are close to the exact updates with P : 


{UPA) = P(A+E)  |Ella- O(ul Alfa) 


FAP) (A-EP  |Ela-O(uli A la) 


5.1.6 Factored Form Representation 


Many Householder based factorization algorithms that are presented in the 
following sections compute products of Householder matrices 


Q = QQr-Q. Qi = I gj vi” (5.1.3) 


where r < n and each v4) has the form 


e DT. 


o 
v9 = (0,0,..-0, 10%), - 


j-1 


It is usually not necessary to compute Q explicitly even if it is involved in 
subsequent calculations. For example, if C € IR'*? and we wish to compute 
QTC , then we merely execute the loop 


for j =1:r 
C =Q;C 
end 


The storage of the Householder vectors v)... u(r) and the corresponding 
B; (if convenient) amounts to a factored form representation of Q. To 
illustrate the economies of the factored form representation, suppose that 
we have an array A and that A(j + 1:n,j) houses »9)(j + 1:n), the essential 
part of the jth Householder vector. The overwriting of C € IC"? with 
QTC can then be implemented as follows: 


forj-Lr 


v:n) = | AG + ini) | (5.1.4) 


C(f:n,:) = (I — Byv(j:n)v(:n)7 )C (Un, 2) 


end 
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This involves about 2gr(2n — r) flops. If Q is explicitly represented as an 
n-by-n matrix, QTC would involve 2n?g flops. 

Of course, in some applications, it is necessary to explicitly form Q 
(or parts of it). Two possible algorithms for computing the Householder 
product matrix Q in (5.1.3) are forward accumulation, 


Q=h, 
for j= l:r 

Q - QQ; 
end 


and backward accumulation, 


Q= In 

for j=r:—1:1 
Q=9;9 

end 


Recall that the leading (j — 1)-by-(j — 1) portion of Q; is the identity. Thus, 
at the beginning of backward accumulation, Q is “mostly the identity” and 
it gradually becomes full as the iteration progresses. This pattern can be 
exploited to reduce the number of required flops. In contrast, Q is full 
in forward accumulation after the first step. For this reason, backward 
accumulation is cheaper and the strategy of choice: 


Q-l 

forjzr:-11 

v(jn)- | AG in. | (5.1.5) 
Qn, j:n) = (I - Byv(3:n)v(3:n)7)Q(n, jen) 


end 


This involves about 4(n?r ~ nr? + r3/3) flops. 


5.1.7 A Block Representation 


Suppose Q = Q)--:Q, is a product of n-by-n Householder matrices as 
in (5.1.3). Since each Q; is a rank-one modification of the identity, it 
follows from the structure of the Householder vectors that Q is a rank-r 
rnodification of the identity and can be written in the form 


Q-I-WYT (5.1.6) 


where W and Y are n-by-r matrices. The key to computing the block 
representation (5.1.6) is the following lemma. 
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Lemma 5.1.1 Suppose Q = 14WYT is an n-by-n orthogonal matriz with 
W,YecR'U, IJP = I — pvu" with v € IR^ and z = —fiQv, then 

Qs = QP = I-W,Y7 
where W, = [W z] and Y, = [Y v] are each n-by-(j + 1). 
Proof. 
(1+ WY?) (1 - pvo?) = I + WYT — 8Quv7 
I-WYT +a" -Ir[Wz][Y v]T 0 


QP 


By repeatedly applying the lemma, we can generate the block representa- 
tion of Q in (5.1.3) from the factored form representation as follows: 


Algorithm 5.1.2 Suppose Q = Qi --- Q« is a product of n-by-n House- 
holder matrices as described in (5.1.3). This algorithm computes matrices 
W,Y € R'** such that Q = I + WYT. 


Y =v) 

W = pv) 

for j = 2:r 
z=-6(I+ WYT)y) 
W 2 [W z] 
Y =[¥ 90] 

end 


This algorithm involves about 2r?n — 2r3/3 flops if the zeros in the vU) are 
exploited. Note that Y is merely the matrix of Householder vectors and is 
therefore unit lower triangular. Clearly, the central task in the generation 
of the WY representation (5.1.6) is the computation of the W matrix. 

The block representation for products of Householder matrices is attrac- 
tive in situations where Q must be applied to a matrix. Suppose C € IP *4. 
It follows that the operation 


C —-Q'Cc«(r-wyYTyc-c«vY(wtc) 


is rich in level-3 operations. On the other hand, if Q is in factored form, 
QTC is just rich in the level-2 operations of matrix-vector multiplication 
and outer product updates. Of course, in this context the distinction he- 
tween level-2 and level-3 diminishes as C gets narrower. 

We mention that the “WY” representation is not a generalized House- 
holder transformation from the geometric point of view. True block reflec- 
tors have the form Q = I — 2VVT where V € R'*" satisfies VTV = I,. 
See Schreiber and Parlett (1987) and also Schreiber and Van Loan (1989). 


Example 5.1.9 Ifn = 4, r = 2, and (1, .6, 0, .8]T and [0, 1, .8, .6]T ere the 
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Householder vectors associated with Q, and Q4 respectively, then 


-1 1.080 
secure SE] ET 


5.1.8 Givens Rotations 


Householder reflections are exceedingly useful for introducing zeros on a 
grand scale, e.g., the annihilation of all but the first component of a vec- 
tor. However, in calculations where it is necessary to zero elements more 
selectively, Givens rotations are the transformation of choice, These are 
rank-two corrections to the identity of the form 


1 0 0. 0 
0 c 3 0 1 
G(i,k,0) = : PC [d : (5.1.7) 
0O e >s o co 0 k 
0 e 0 0-- 1 
i k 


where c = cos(0) and s = sin(@) for some @. Givens rotations are clearly 
orthogonal. 

Premultiplication by G(i, k, 0)? amounts to a counterclockwise rotation 
of @ radians in the (i,k) coordinate plane. Indeed, if z € R” and y = 
G(i, k,0)Tz, then 


Cti — sz, jst 
yc SI; Cry Jak 
z; j*ük 
From these formulae it is clear that we can force yy to be zero by setting 


Zu —Ik 
cs —RL—— $-—————; (5.1.8) 
VEET Vaita 
Thus, it is a simple matter to zero a specified entry in a vector by using a 
Givens rotation. In practice, there are better ways to compute c and s than 
(5.1.8). The following algorithm, for example, guards against overflow. 
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Algorithm 5.1.3 Given scalars a and b, this function computes c = cos(@) 


SAAR Esa] 


function: [c,s] = givens(a, b) 


ifb=0 
c=1;3s=0 
else 
if |b] |a] 
r= —a/b; s 21V T5 c= sr 
else 
T = —b/a; e-1/V1- r5 s= 
end 
end 


This algorithm requires 5 flops and a single square root. Note that it does 
not compute @ and so it does not involve inverse trigonometric functions. 


Example 5.1.4 If z = (1, 2, 3, 4T, cos(6) = 1/75, and sin(8) = -2/V5, then 
G(2,4,8)z = [1, v20, 3, 0)T. 


5.1.9 Applying Givens Rotations 


It is critical that the simple structure of a Givens rotetion matrix be ex- 
ploited when it is involved in a matrix multiplication. Suppose A € IR?*^, 
c = cos(0), and s = sin(#). If G(i, k,0) € R™*™, then the update A — 
G(i, k, 07 A effecta just two rows of A, 


c s 
s c 


aia = [ .. EZE 


and requires just 6n flops: 
for j = lm 
n = Afi, j) 
T2 = A(k, j) 
A(1,j) = en — sr» 
A(2,j) sm + er 
end 
Likewise, if G(i, k, 0) € IR*", then the update A — AG(i, k, 0) effects just 
two columns of A, 


AGE) = Abs 4) | : :] 


-3 C 


and requires just 6m flops: 


5.1. HOUSEHOLDER AND GIVENS MATRICES 217 


for j = 1:m 
n = Alji) 
Ta = AÇ), k) 


A(j, 1) = en ~ 57 
AG, k) = s + ery 
end 


5.1.10 Roundoff Properties 


The numerical properties of Givens rotations are as favorable as those for 
Householder reflections, In particular, it can be shown that the computed 
é and $ in givens satisfy 
é 
8 


c(1 + €c) €c O(u) 
(1 + e) 6 O(u). 


If ĉ and 5 are subsequently used in a Givens update, then the computed 
update is the exact update of a nearby matrix: 


FUG, k, 0)" A] G(,k6)(A-E) Ele = ull All2 


fi[AGG, k, 8] (A+ E)G(i, k, 6) I E la = ull Al 


A detailed error analysis of Givens rotations may be found in Wilkinson 
(1965, pp. 131-39). 


5.1.11 Representing Products of Givens Rotations 


Suppose Q = G, --- G; is a product of Givens rotations. As we have seen in 
connection with Householder reflections, it is more economical to keep the 
orthogonal matrix Q in factored form tban to compute explicitiy the prod- 
uct of the rotations. Using a technique demonstrated by Stewart (1976), 
it is possible to do this in a very compact way. The idea is to associate a 
single floating point number p with each rotation. Specifically, if 
Z= | eos | +s? 21 
—s c 
then we define the scalar p by 
ifc-0 
p=1 
elseif |s] < |c| 
p = sign(c)s/2 (5.1.9) 


p = 2sign(s)/c 
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Essentially, this amounts to storing s/2 if the sine is smaller and 2/c if the 
cosine is smaller. With this encoding, it is possible to reconstruct +Z as 
follows: 


ifp=1 
c=0; s=1 
elseif |p| <1 
s=2p,;c=V1-s (5.1.10) 
else 
c=2/p s= V1—cl 
end 


That -Z may be generated is usually of no consequence for if Z zeros a 
particular matrix entry, so does —Z. The reason for essentially storing the 
smaller of c and s is that the formula /1— 2? renders poor results if z is 
near unity. More details may be found in Stewart (1976). Of course, to 
“reconstruct” G(i, k,0) we need i and k in addition to the associated p. 
This usually poses no difficulty as we discuss in 85.2.3. 


5.1.12 Error Propagation 


We offer some remarks about the propagation of roundoff error in algo- 
rithms that involve sequences of Householder /Givens updates. To be pre- 
cise, suppose A = Ay € IR"*" is given and that matrices Aj,..., Ap = B 
are generated via the formula 


Ax = FU(QuAn-12) — ke Lp. 


Assume that the above Householder and Givens algorithms are used for 
both the generation and application of the Q, and Z, . Let Q, and Z, be 
the orthogonal matrices that would be produced in the absence of roundoff. 
Tt can be shown that 


B = (Q,---QY(A + EZi--- Zp), (5.1.11) 


where || E || S cul A (Ja and c is a constant that depends mildiy on n, m, 
and p. In plain English, B is an exact orthogonal update of a matrix near 
to A. 


5.1.13 Fast Givens Transformations 


The ability to introduce zeros in a selective fashion makes Givens rotations 
an important zeroing tool in certain structured problems. This has led to 
the development of “fast Givens” procedures, The fast Givens idea amounts 
to a clever representation of Q when Q is the product of Givens rotations. 


5.1. HOUSEHOLDER AND GIVENS MATRICES 219 


In particular, Q is represented by a matrix pair (M, D) where MTM = D = 
diag(d;) and each d; is positive. The matrices Q, M, and D are connected 
through the formula 


Q = MD“? = Mdisg(1/ V/d;). 


Note that (MD-/2)T(MD-¥/2) = D-V?DD-V? = [ and so the ma 
trix MD-/? is orthogonal. Moreover, if F is an n-by-n matrix with 
FT DF = Dnew diagonal, then MI, Mas = Drew where May = MF. 
Thus, it is possible to update the fast Givens representation (M, D) to ob- 
tain (Mnew, Drew). For this idea to be of practical interest, we must show 
how to give F zeroing capabilities subject to the constraint that it “keeps” 
D diagonal. 

The details are best explained at the 2-by-2 level. Let z = [zı 22]? and 
D = diag(d;, dz) be given and assume that d; and dz are positive. Define 


-|A 1 
M = | 1 4l (5.1.12) 
and observe that B 
T. 121 + 22 
Miz = zı taz? 
and dp + 62d, ditd 
T - a+ Bid, di +da | _ 
Mi DM poe dy + add =D. 
If z2 X 0, ay = —2:1/z2, and f = —ayd2/di, then 
1+ 
Mir = [7° »] 
T LI de(1+7) 0 
Mi DM, = | 0 — dm) 
where 5; = —a1f  (do/di)(z1/23)?. 
Analogously, if we assume z, 4 0 and define M2 by 
_ 1 a 
Mı = [a 1 | (5.1.13) 


where a? = —z2/z1 and fj; = —(di/d;)as, then 


MIz- | a(t +) 


= | al +n) 0 = 
M7 DM; = | "0 arn) | = Dy 
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where y? = —a2fz = (d:/d2)(z2/21)?. 

It is easy to show that for either i = 1 or 2, the matrix J = D"? M,D; "°? 
is orthogonal and that it is designed so that the second component of 
J™(D-1/2z) ig zero. (J may actually be a reflection and thus it is half- 
correct to use the popular term “fast Givens.”) 

Notice that the ^, satisfy yy. = 1. Thus, we can always select M; in 
the above so that the “growth factor” (1+ y:) is bounded by 2. Matrices 


of the form 
[86 1 [1 a 
[oz] [zT] 


that satisfy —1 € a4; < 0 are 2-by-2 fast Givens transformations. Notice 
that premultiplication by a fast Givens transformation involves half the 
number of multiplies as premultiplication by an "ordinary" Givens trans- 
formation. Álso, the zeroing is carried out without an explicit square root. 

In the n-by-n case, everything "scales up" as with ordinary Givens ro- 
tations. The “type 1" transformations have the form 


Lows Ov. 0. 0 
0 B- 1 0 1 

F(ik,ag) = i Poe d : (5.1.14) 
0 1 a c 0 k 
0 0 0 1 


i k 
while the “type 2” transformations are structured as follows: 


1-.9..0.. 0 
0 1 a 0 1 
F(i,k,a,8) = : Poe. d : (5.1.15) 
0 B 15 0 |k 
0 0 0 1 
k 


Encapsulating all this we obtain 


Algorithm 5.1.4 Given z € R? and positive d € R?, the following al- 
gorithm computes a 2-by-2 fast Givens transformation M such that the 


5.1. HOUSEHOLDER AND GIVENS MATRICES 221 


second component of M7 z is zero and MT DM = D; is diagonal where D 
= diag(d;, dz). If type = 1 then M has the form (5.1.12) while if type = 2 
then M has the form (5.1.13). The diagonal elements of D, overwrite d. 


function: [a, 6, type] = fast.givens(z, d) 


if z(2) #0 
a = ~2(1)/2(2); P = ~ad(2)/d(1); y = -ap 
ify<1 
type = 1 


T = d(1); d(1) = (1 + y)d(2); d(2) = 1 9r 


type = 2 
a=1/œ; P = 1/6; y= 1/7 
d(1) = (1 + y)d(1); d(2) = (1 + y)d(2) 
end 
else 
type = 2 
a=0; 6=0 
end 


The application of fast Givens transformations is analogous to that for 
ordinary Givens transformations. Even with the appropriate type of trans- 
formation used, the growth factor 1+ y may still be as large as two. Thus, 
2* growth can occur in the entries of D and M after s updates. This means 
that the diagonal D must be monitored during a fast Givens procedure to 
avoid overflow. See Anda and Park (1994) for how to do this efficientiy. 

Nevertheless, element growth in M and D is controlled because at all 
times we have M D-1? orthogonal. The roundoff properties of a fast givens 
procedure are what we would expect of a Givens matrix technique. For ex- 
ample, if we computed Q = fI( M Ô-1/?) where M and D are the computed 
M and D, then Q is orthogonal to working precision: || Q7Q - J || = u. 


Problems 


P5.1.1 Execute house with z = (1, 7, 2, 3, -1]T. 

P5.1.2 Let z and y be nonzero vectors in R”. Give an algorithm for determining a 
Housebolder matrix P such that Px is a multiple of y. 

P5.1,3 Suppose z€ C^ and that zi = |zile'* with 8c R. Asume z £ 0 and 
define u = z + e^" |x|lze1 Show that P = 1 — 2uu4/u‘Ty is unitary and that 
Pz = -e| z lae: 

P5.1.4 Use Householder matrices to show that det(1 4- zyT) = 1+27y where z and y 
are given n-vectors. 

P5.1.5 Suppose z€ C’. Give an algorithm for determining unitary matrix of the 
form 


a-| c 1] c€R, +h’ 21 


-s € 
such that the second component of Q" z is zero. 
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P5.1.0 Suppose z and y are unit vectors in KC. Give an algorithm using Givens 
transformations which computes an orthogonal Q such that QT x = y. 
P&.1.7 Determine c = coe(8) and s = sin(8) such that 


[2 SES] 8]. 


P5.1.8 Suppose that Q = 1 + YTYT is orthogonal where Y € R?*J and T € Fj*7 is 
upper triangular. Show that if Q} = QP where P = 1 — 2uv juTv is a Householder 
matrix, then Q4. can be expremed in tbe form Q4} = 1+¥;T,Y7 where Y} € ROXU+) 
and T, € RU+)*U+D is upper triangular. 

P5.1.9 Give a detailed implementation of Algorithm 5.1.2 with the assumption thet 
vO) (j-- 1:n), the essential part of the the jth Householder vector, is stored in A(j+ 1:n, j). 
Since Y is effectively represented in A, your procedure need only set up tbe W matrix. 
P5.1.10 Show that if S is skew-symmetric (ST = —S), then Q = (14- S)(1— S)^! is 
orthogonal (Q is called the Cayley transform of S.) Construct a rank-2 S so that if z 
is a vector then Qz is zero except in the first component. 

P5.1.11 Suppose P € KC" satisfies | PT P — In ||; = « < 1. Show that all the singular 
values of P are in tbe interval (1~¢, 1+ | and that || P — UVT ||, < e where P = UXVT 
isthe SVD of P. 

P5.1.12 Suppose A € R?*?, Under what conditions is the closest rotation to A closer 
than tbe closest reflection to A? 
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5.2 The QR Factorization 


We now show how Householder and Givens transformations can be used to 
compute various factorizations, beginning with the QR factorization. The 
QR factorization of an m-by-n matrix A is given by 


A=QR 


where Q € R™*™ is orthogonal and R € R™”" is upper triangular, In this 
section we assume m > n. We will see that if A has full column rank, 
then the first n columns of Q form an orthonormal basis for ran(A). Thus, 
calculation of the QR factorization is one way to compute an orthonormal 
basis for a set of vectors. This computation can be arranged in several ways. 
We give methods based on Householder, block Householder, Givens, and 
fast Givens transformations. The Gram-Schmidt orthogonalization process 
and a numerically more stable variant called modified Gram-Schmidt are 
also discussed. 
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5.2.1 Householder QR 


We begin with a QR factorization method that utilizes Householder trans- 
formations. The essence of the algorithm can be conveyed by a small ex- 
ample. Suppose m = 6, n = 5, and assume that Householder matrices Hi 
and H; have been computed so that 


x 


H3H4A = 


ooooo 
ooooxx 
gau xx 
XXXXXX 
XXXXXX 


ries, we determine a Householder ma- 


Concentrating on the highlighted 
trix A, € R** such that 


a x 

5 a 0 

Hg 0 

a 0 

If Hy = diag(I2, H3), then 

X X X X X 
0 x x x x 
0.0 x x x 
HHH A = 0 00 x x 
0 0 0 x x 
00 0 x x 


After n such steps we obtain an upper triangular H,H, .,:.: H1AÀ = Rand 
so by setting Q = Hi..- Ha we obtain A = QR. 


Algorithm 5.2.1 (Householder QR) Given A € R"*" with m > n, 
the following algorithm finds Householder matrices H,,,., , Hn such that if 
Q = Hi... Hn, then QTA = Ris upper triangular. The upper triangular 
part of A is overwritten by the upper triangular part of R and components 
j 1: of the jth Householder vector are stored in A(j + 1:m, j),j < m. 


for j = l:n 
[v, 5| = house(A( jm, j)) 
A(j:m, jin) = (Im 541 — BvvT ) A(j:m, jn) 
ifj«m 
A(j + lim, j) = (2m — j 4 1) 
end 
end 
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This algorithm requires 2n?(m — n/3) flops. 
To clarify how A is overwritten, if 
v = [0,...,0,1,09),,...,) ]T 
ja 
is the jth Householder vector, then upon completion 
Yu. Y ē "3 Tu Tg 


U) fea Ta "OA "25 


b Pun 
UV Uo Y ra rae 
vf?) uf?) no) uf T55 
x yf yf u9 


If the matrix Q = H,-:- Hn is required, then it can be accumulated using 
(5.1.5). This accumulation requires 4(m?n — mn? + n?/3) flops. 

The computed upper triangular matrix Ê is the exact R for a nearby A 
in the sense that ZT (A E) = Ê where Z is some exact orthogonal matrix 
and || E || ~ ull A ll». 


5.2.3 Block Householder QR Factorization 


Algorithm 5.2.1 is rich in the level-2 operations of matrix-vector multi- 
plication and outer product updates. By reorganizing the computation 
and using the block Householder representation discussed in §5.1.7 we can 
obtain a level-3 procedure. The idea is to apply clusters of Householder 
transformations that are represented in the WY form of §5.1.7. 

A small example illustrates the main idea. Suppose n = 12 and that 
the “blocking parameter” r has the value r = 3. The first step is to gener- 
ate Householders H1, Ho, and H} as in Algorithm 5.2.1. However, unlike 
Algorithm 5,2.1 where the H; are applied to all of A, we only apply Hi, 
A, and H3 to A(:,1:3). After this is accomplished we generate the block 
representation H, HH, = I + Wi YT and then perform the level-3 update 


A(:,4:12) = (I + WYT)A(., 4:12) . 
Next, we generate H4, Hs, and He as in Algorithm 5.2.1. However, these 


transformations are not applied to A(:, 7:12) until their block representation 
H, Hs He = I + W;Y is found. This illustrates the general pattern. 
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Azl,k=0 
while \<n 
r=min(A+r—-—l1,n);k=k+1 
Using Algorithm 5.2.1, upper triangularize A(A:m, A:n) 
generating Householder matrices Hy,...,H,- (5.2.1) 
Use Algorithm 5.1.2 to get the block representation 
I + WyY, = Hy.. Hr 
Alm, r + Ln) = (I+ WYE) A(m, r + Ln) 
A=r4l] 
end 


The zero-nonzero structure of the Householder vectors that define the ma- 
trices H),...,H, implies that the first A — 1 rows of Wy and Y, are zero. 
This fact would be exploited in a practical implementation. 

The proper way to regard (5.2.1) is through the partitioning 


A = [An An] N = ceil(n/r) 


where block column A, is processed during the kth step. In the kth step of 
(5.2.1), a block Householder is formed that zeros the subdiagonal portion 
of Ay. The remaining block columns are then updated. 

The roundoff properties of (5,2.1) are essentially the same as those for 
Algorithm 5.2.1, There is a slight increase in the number of flops required 
because of the W-matrix computations. However, as a result of the block- 
ing, all but a small fraction of the flops occur in the context of matrix mul- 
tiplication. In particular, the level-3 frection of (5.2.1) is approximately 
1 — 2/N. See Bischof and Van Loan (1987) for further details. 


5.2.3 Givens QR Methods 


Givens rotations can also be used to compute the QR factorization. The 
4-by-3 case illustrates the general idea: 


X x x X X x X X xX 

X x X[G9I X x xjeG2|x x x | a2 
X X x X X x 0 x x 

X x X 0 x x 0 x x 

X x x X Xx Xx X X x 

0 x x G4) 0 x x (2,3) 0 x x { (4) 
0 x x 0 x x 0 0 x “IR 
0 x x 0 0 x 0 0 x 


Here we have highlighted the 2-vectors that define the underlying Givens 
rotations. Clearly, if G; denotes the jth Givens rotation in the reduction, 
then Q7 A = R is upper triangular where Q = G4. -- G, and t is the total 
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number of rotations. For general m and n we have: 


Algorithm 5.2.2 (Givens QR) Given A € R"*^ with m > n, the fol- 
lowing algorithm overwrites A with QTA = R, where R is upper triangular 
and Q is orthogonal. 


for j = l:n 
for i =m: —l1:j+1 
[c, s] = givena( A — 1, j), A(i, 3)) 
. 4 c s ; a 
A(t — li, jn) = -s c ] A(i — 13, jin) 
end 
end 


This algorithm requires 3n?(m — n/3) flops. Note that we could use (5.1.9) 
to encode (c, s) in a single number p which could then be stored in the zeroed 
entry A(i,j). An operation such as z ~ QT z could then be implemented 
by using (5.1.10), taking care to reconstruct the rotations in the proper 
order. 

Other sequences of rotations can be used to upper triangularize A. For 
example, if we replace the for statements in Algorithm 5.2.2 with 


for i = m: — 1:2 
for j = 1:min{i — 1, n} 


then the zeros in A are introduced row-by-row. 

Another parameter in a Givens QR. procedure concerns the planes of 
rotation that are involved in the zeroing of each a;;. For example, instead 
of rotating rows i — 1 and i to zero a,; as in Algorithm 5.2.2, we could use 
rows j and i: 


for j=1:n 
for i= m -Lj + 
[es] = givens( AC, j A3) 
Aldia | s £] Alihi 
end 
end 


5.2.4 Hessenberg QR. via Givens 


As an example of how Givens rotations can be used in structured problems, 
we show how they can be employed to compute the QR factorization of an 
upper Hessenberg matrix. A small example illustrates the general idea. 
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Suppose n = 6 and that after two steps we have computed 


x x 


G(2,3,60:)7G(1,2,81)7 A = 


oooo 


x 
0 
0 
0 
0 


x 


x 
x 
x 
0 
0 


x 


cOXXXXx 


XX XX XX 


X X XX XX 


We then compute G(3, 4, 05) to zero the current (4,3) entry thereby obtain- 


ing 


G(3,4,03)7 G(2, 3, #2)" G(1, 2, 4)7A = 


ooooo x 


Overall we have 


oooo XK * 


oO X KX XK 


OX XXXX 


X XX XXX 


X XX X XX 


Algorithm 5.2.3 (Hessenberg QR) If A € IR?" is upper Hessenberg, 
then the following algorithm overwrites A with QT A = R where Q is or- 
thogonal and R is upper triangular. Q = G, ---G,_ is a product of Givens 
rotations where G; has the form G; = G(j, j  1,6,). 


for j=1:n-1 
[c s] = givens(A(j, j), AG +13) 
- ny c s 
AQ +1, 5:0) = | -s c 
end 


This algorithm requires about 3n? flops. 


5.2.5 Fast Givens QR 


A(:j +1, jin) 


We can use the fast Givens transformations described in §5.1,13 to compute 
an (M, D) representation of Q. In particular, if M is nonsingular and D 


is diagonal such that MTA = T is upper triangular and MT M 
diagonal, then Q = MD-¥/? is orthogonal and QTA = D-V?T 


upper triangular. Analogous to the Givens QR procedure we have: 


Dis 
Ris 


Algorithm 5.2.4 (Fast Givens QR) Given A € R™"" with m > n, the 
following algorithm computes nonsingular M € R™*™ and positive d(1:m) 
such that MT A = T is upper triangular, and MTM = diag(di,..., ds). A 
is overwritten by T. Note: A = (MD- 2)(D'/*T is a QR factorisation 


of A. 
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for i = 1:m 
d(i)-1 

end 

for j «lm 


fori-m-ljcl 
[a, B, type] = fast.givens( A(i — 1:4, j), d(1 — 1:)) 


if type = 1 

Ai — Li jn) = | 51 ] ^ - 1i, jn) 
else 

A(i — 1:4, fen) = | a 1 ] ^ — ld, jin) 


end 
end 


This algorithm requires 2n?(m — n/3) flops. As we mentioned in the pre- 
vious section, it is necessary to guard against overflow in fast Givens algo- 
rithms such as the above. This means that M, D, and A must be periodi- 
cally scaled if their entries become large. 

If the QR factorization of a narrow band matrix is required, then the 
fast Givens approach is attractive because it involves no square roots. (We 
found LDL? preferable to Cholesky in the narrow band case for the same 
reason; see $4.3.6.) In particular, if A € R™** has upper bandwidth q and 
lower handwidth p, then QT A = R has upper bandwidth p +q. In this 
case Givens QR requires about O(np(p + q)) flops and O(np) square roots. 
Thus, the square roots are a significant portion of the overall computation 
if pg & n. 


$.2.6 Properties of the QR Factorization 


The above algorithms "prove" that the QR factorization exists. Now we 
relate the columns of Q to ran( A) and ran( A)* and examine the uniqueness 
question. 


Theorem 5.2.1 If A = QR is a QR factorization of a full column rank 
A € R?*" and A= [ai,...,a04] and Q =[q1,--.,¢m] are column parti- 
tionings, then 

span{a,,...,a,} = spen(q,...,d«) kzlmn. 
In particular, if Qi = Q(1:m, lin) and Q3 = Q(1:m,n + lim) then 


ran( A) ran(Qi) 
ran(A)* ran(Qz) 


and A= QR, with R, = R(1m, l:n). 


i 


Ml 
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Proof. Comparing kth columns in A = QR we conclude that 


k 
ak = Yo rag € span{qiı,--. gk} . (5.2.2) 
im] 
Thus, span{ai,...,ax} © span(qi...,qk). However, since rank(A) = 
n it follows that span(a;,...,a&) has dimension k and so must equal 
span(gi,...,qk) The rest of the theorem follows trivially. O 


The matrices Q1 = Q(1:m, 1:n) and Q4 = Q(1:m,n + 1:m) can be easily 
computed from a factored form representation of Q. 

If A= QR is à QR factorization of A € IR?" and m > n, then we refer 
to A = Q(:; 1:n) R(1:n, 1:n) as the thin QR factorization. The next result 
addresses the uniqueness issue for the thin QR factorization 


Theorem 5.2.2 Suppose A € R™*” has full column rank. The thin QR. 
factorization 

A=Q ki 
is unique where Qı € IR™* has orthonormal columns and R is upper tri- 
angular with positive diagonal entries. Moreover, Ry = GT where G is the 
lower triangular Cholesky factor of AT A. 


Proof. Since AT A = (Q1R,)T (Q1 R5) = RY R, we see that G = RT is the 
Cholesky factor of AT A. This factor is unique by Theorem 42.5. Since 
Q= AR;! it follows that Q is also unique. O 


How are Qı and R; affected by perturbations in A? To answer this 
question we need to extend the notion of condition to rectangular matrices. 
Recall from §2.7.3 that the 2-norm condition of a square nonsingular matrix 
is the ratio of the largest and smallest singular values. For rectangular 
matrices with full column rank we continue with this definition: 


Omaz(A) 

Omin(A) * 

If the columns of A are nearly dependent, then «;(A) is large. Stewart 
(1993) has shown that O(c) relative error in A induces O(ex9(A)) relative 
error in R and Qj. 


A€R?7""rank(A) =n => &«(A) = 


5.2.7 Classical Gram-Schmidt 


We now discuss two alternative methods that can be used to compute the 
thin QR factorization A = Qı Rı directly. If rank(A) = n, then equation 
(5.2.2) can be solved for qx: 


k-i 
qx = (a -= Frua) fm 
imi 
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Thus, we can think of q, as a unit 2-norm vector in the direction of 


k-i 
% = ok — V radi 


iml 
where to ensure zy € span{q),...,qk-1}+ we choose 
Tik = Gok i-lk-1. 


This leads to the classical Gram-Schmidt (CGS) algorithm for computing 
Az Qi. 


R(1,1) =|) AG ll 
Q(51) = AG, 1)/R(1,1) 
for k = 2:n 
R(ik —1,k) = Q(1:m, 1:k — 1)7 A(1:m, k) 
z = A(lim,k) — Q(1:m, Lk — 1)R(1:k — 1,k) (5.2.3) 
R(k,k) = lza 
Q(1:m, k) = z/R(k,k) 


end 
In the kth step of CGS, the kth columns of both Q and R are generated. 


5.2.8 Modified Gram-Schmidt 


Unfortunately, the CGS method has very poor numerical properties in that 
there is typically a severe loss of orthogonality among the computed q,- 
Interestingly, a rearrangement of the calculation, known as modified Gram» 
Schmidt (MGS), yields a much sounder computational procedure. In the 
kth step of MGS, the kth column of Q (denoted by q4) and the kth row of 
R (denoted by rf) are determined. To derive the MGS method, define the 
matrix AO € IR™*(9-k4+1) by 


k-1 n 
A- Sark = Yar? = [0 499. (5.2.4) 
i-i imk 
It follows that if 
49 -[z B] 
1 n-k 
then rig = j| zila. qe = z/rkk and (r&k41:*-Tk4) = 4EB. We then 


compute the outer product AW +) = B — qx (ri 141 -- ris) and proceed 
to the next step. This completely describes the kth step of MGS. 


Algorithm 5.2.5 (Modified Gram-Schmidt) Given A € R"*" with 
rank( A) = n, the following algorithm computes the factorization A = Qi Ft 
where Qj € IR™*" has orthonormal columns and R, € IR?*" is upper tri- 
angular. 
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for k =1:n 
R(k,k) = || A(1:m, k) ll 
Qm, k) = A(1:m, k) /R(k, k) 
for j=k+1:n 
R(k, j) = Q(1:m, k)” A(1:m, j) 
A(L:m, j) = A(1:m, j) — Q(1:m, k) Rk, j) 
end 
end 
This algorithm requires 2mn? flops. It is not possible to overwrite A with 
both Q, and F4. Typically, the MGS computation is arranged so that A is 
overwritten by Q, and the matrix R; is stored in a separate array. 


5.2.9 Work and Accuracy 


If one is interested in computing an orthonormal basis for ran(A), then 
the Householder approach requires 2mn? — 2n3/3 flops to get Q in fac- 
tored form and another 2mn? — 23/3 flops to get the first n columns of 
Q. (This requires "paying attention" to just the first n columns of Q in 
(5.1.5).) Therefore, for the problem of finding an orthonormal basis for 
ran(A), MGS is about twice as efficlent as Householder orthogonalization. 
However, Björck (1967) has shown that MGS produces a computed Q1 = 
[dài,. .. dn ] that satisfies 

QT. = I + Eucs — | Emas lla  ux;(A) 
whereas the corresponding result for the Householder approach is of the 
form 

QiQ: = I + Eg |Enlasu. 

Thus, if orthonormality is critical, then MGS should be used to compute 
orthonormal bases only when the vectors to be orthogonalized are fairly 
independent, 

We also mention that the computed triangular factor R produced by 
MGS satisfies || A — QR || = ulj A || and that there exists a Q with perfectly 
orthonormal columns such that || A — QR || = ull A ||. See Higham (1996, 
p.379). 


Example 5.2.1 If modified Gram-Schmidt is applied to 


1 1 
Ax [ 0 | r(A) 25 14-10? 
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5.2.10 A Note on Complex QR 


Most of the algorithms that we present in this book have complex ver- 
sions that are fairly straight forward to derive from their real counterparts. 
(This is not to say that everything is easy and obvious at the implementa- 
tion level.) As an illustration we outline what a complex Householder QR 
factorization algorithm looks like. 

Starting at the level of an individual Householder transformation, sup- 
pose 0 Z z € €^ and that zj = re where r, € R. If v= z + e|] z |2e: 
and P = In — pvu, p = 2/v v, then Pz = Fe] z ilze. (See P5.1.3.) 
The sign can be determined to maximize || v ||z for the sake of stability. 

The upper triangularization of A € IR"*^, m > n, proceeds as in Algo- 
rithm 5.2.1. In step j we zero the subdiagonal portion of A(7:m, j): 


for j —- lm 
z= Am) 
v = T + e”]| z ||z6, where z; = re”. 
B =2/v"/v 


A(j:m, 9:7) = (Im-j+1 — pwv” Alm, jin) 
end 


The reduction involves 8n?(m — n/3) real flops, four times the number 
required to execute Algorithm 5.2.1. If Q = P, --- Pn is the product of the 
Householder transformations, then Q is unitary and QTA = R € R™*" is 
complex and upper triangular. 


Problems 


P5.2.1 Adapt the Householder QR algorithm so that it can efficiently handle the case 
when A € R”*” has lower bandwidth p and upper bandwidth q. 


P5.2.2 Adapt the Householder QR algorithm so that it computes the factorization 
A = QL where L is lower triangular and Q is orthogonal. Assume that A is square. This 
involves rewriting the Housebolder vector function v = house(z) so that (1—2vvT /vT v) 
is zero everywhere but its bottom component. 


P5.2.3 Adapt the Givens QR factorization algorithm so that the zeros ere introduced by 
diagonal, That is, the entries are zeroed in the order (m, 1), (m — 1, 1), (m, 2), (m - 2, 1), 
(m — 1,2), (m, 3) , ete. 

P5.2.4 Adapt the fast Givens QR factorization algorithm so that it efficiently handles 
the case when A is n-by-n and tridiagonal. Assume that the subdiagonal, diagonal, and 
superdiagonal of A are stored in e(I:n — 1), a(Im), f(1:n — 1) respectively. Design your 
algorithm so that these vectors are overwritten by the nonzero portion of T. 

P5.2.5 Suppose L c R™*" with m > n is lower triangular. Show how Householder 
matrices Hi... Hn can be used to determine a lower triangular Li c R°™" so that 


L 
Has HL = [ o ] 
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Hint: The second step in tbe 6-by-3 csse involves finding H2 so that 
x 


Ha 


XXXXX 
XXXXXo 
oooxoo 
1 
XXXXXX 
ocooxxo 
oooxoo 


with the property that rows 1 and 3 ere left alone. 
P5.2.6 Show that if 


R w k c k 
As (3 v] è= [i] «5. 
k n-k 


2 

and A has full column rank, then min | Az — 6/2 = ld - (vTd/Ņ vla). 
P5.2.7 Suppose A € RO*" and D = diag(di,...,d4) € RO". Show how to construct 
an orthogonal Q such that QT A — DQT = R is upper triangular. Do not worry about 
efficiency—this is just an exercise in QR. manipulatinn. 
P5.2.8 Show how to compute the QR factorization of the product A = Ap. A2A1 
witbout explicitly multiplying the matrices Ai,..., Áp together. Hint: In the p = 
3 case, write QA = QT AsQuQI AsQiQT A1 and determine orthogonal Q; so that 
QT (AsQi-1) is upper triangular. (Qo = I). 
P5.2.9 Suppose A € K*" and let E be the permutation obtained by reversing the 
order of the rows in In. (This is just the exchange matrix of $4.7.) (a) Show that if 
Re R?*" is upper triangular, then L = ERE is lower triangular. (b) Show how to 
compute an orthogonal Q € KU**^* and a lower triangular L € RO" so that A = QL 
assuming the availability of a procedure for computing the QR factorization. 
P5.2.10 MGS applied to A € EP" *" is numerically equivalent to the first step in House- 
holder QR applied to 

z [On 

i[*] 


where O, is the n-by-n zero matrix. Verify that this statement is true after the first 
step of each method is completed. 

P5.2.11 Reverse the loop orders in Algorithm 5.2.5 (MGS QR) ao that R is computed 
coiumn-by-column. 

P&.2.12 Develop a complex version of the Givens QR factorization. Refer to P5.1.5 
where complex Givens rotations are the theme. Is it possible to organize tbe calculations 
so that the diagonal elements of F are nonnegative? 


Notes and References for Sec. 5.2 
The idea of using Housebolder transformations to soive the LS problem was proposed in 


A.S. Householder (1958). “Unitary Triangularization of a Nonsymmetric Matrix,” J. 
ACM, 5, 339-42. 


The practical details were worked out in 


P. Businger and G.H. Golub (1965). “Linear Least Squares Solutions by Housebolder 
" Numer. Math. 7, 269-76. See also Wilkinson and Reinsch 
(1971,111-18). 
G.H. Golub (1965). “Numerical Methods for Solving Linear Least Squares Problems,” 
Numer. Math. 7, 206-16. 
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The basic references on QR via Givens rotations include 


W. Givens (1958). “Computation of Plane Unitary Rotations Transforming a General 
Matrix to Triangular Form,” SIAM J. App. Math. 6, 26-50. 

M. Gentleman (1973). "Error Analysis of QR Decompositions by Givens Transforma- 
tions,” Lin. Alg. and Its Appl. 10, 189-97. 


For a discussion of how the QR factorization can be used to solve numerous problems in 
statistical computation, see 


G.H. Golub (1989). “Matrix Decompositions and Statistical Computation," in Statistical 
Computation , ed. R.C. Mliton and J.A. Nelder, Academic Press, New York, pp. 
365-97. 


The behavior of tbe Q and F factors when A is perturbed is discussed in 


G.W. Stewart (1977). “Perturbation Bounds for the QR Factorization of a Matrix,” 
SIAM J. Num. Anal. 14, 509-18. 

H. Zha (1993). “A Componentwise Perturbation Analysis of the QR Decomposition,” 
SIAM J. Matriz Anal. Appl 4, 1124-1131. 

G.W. Stewart (1993). “On the Perturbation of LU Cholesky, and QR Factorizations,” 
SIAM J. Matriz Anal. Appl. 14, 1141-1145. 

A. Barrlund (1994). “Perturbation Bounds for the Generalized QR Factorization,” Lin. 
Alg. and its Applic. 207, 251-271. 

J.-G. Sun (1995). "On Perturbatlon Bounds for tbe QR Factorization,” Lin. Alg. ond 
Its Applic. 215, 95-112. 


The main result is that the changes in Q and F are bounded by the condition of A times 
the relative change in A. Organizing the computation so that the entries in Q depend 
continuously on the entries in A is discussed in 


T.F. Coleman and D.C. Sorensen (1984). “A Note on the Computation of an Orthonor- 
mal Basis for the Null Space of a Matrix,” Mathematical Programming 29, 234-242. 


References for tbe Gram-Schmidt process include include 


J.R. Rice (1988). “Experiments on Gram-Schmidt Orthogonslization,” Math. Comp. 
20, 325-28. 

A. Björck (1987). “Solving Linear Least Squares Problems by Gram-Schmidt Orthogo- 
nalization, BIT 7, 1-21. 

N.N. Abdelmalek (1971). “Roundoff Error Analysis for Gram-Schmidt Method and 
Solution of Linear Least Squares Problems,” S/T 11, 345-68. 

J. Daniel, W.B. Gragg, L.Ksufman, and G.W. Stewart (1976). "Reocthosonalization 
and Stable Algorithms for Updating the Gram-Schmidt QR Factorization,” Math. 
Comp. 30, TT2-795. 

A. Ruhe (1983). “Numerical Aspects of Gram-Schmidt Orthogonalization of Vectors,” 
Lin. Alg. and Its Applic. 52/53, 581—601. 

W. Jalby and B. Philippe (1991). "Stability Analysis and Improvement of the Block 
Gram-Schmidt Algorithm,” SIAM J. Sci. Stat. Comp. 12, 1058-1073. 

A. Björck and C.C. Paige (1992). “Loss and Recapture of Orthogonality in the Modified 
Gram-Schmidt Algorithm,” SIAM J. Matriz Anal. Appi. 13, 176-190. 

A. Björck (1994), “Numerics of Gram-Schmidt Orthogonalization Lin. Alg. and Its 
Applic. 197/198, 291-316. 


The QR factorization of a structured matrix is usually structured itself. See 


A.W. Bojanczyk, R-P. Brent, and F.R. de Hoog (1986). "QR Factorization of Toeplita 
Matrices," Numer. Math. 49, 81-94. 
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S. Qiao(1986). "Hybrid Algorithm for Fast Toeplitz Orthogonslisation,” Numer. Math. 
53, 351-366. 

C.J. Demeure (1969). “Fast QR Factorization of Vandermonde Matrices,” Lin. Alg. 
ond Its Applic. 122/123/124, 165-194. 

L. Reichel (1991). "Fast QR Decomposition of Vandermonde-Like Matrices and Polyno- 
mial Least Squares Approximation,” SIAM J. Matriz Anal. Appi. 12, 562-564. 

D.R. Sweet (1991). "Fast Block Toeplitz Orthogonalization,” Numer. Math. 58, 613- 
629. 


Various high-performance issues pertaining to the QR factorization are discussed in 

B. Mattingly, C. Meyer, and J. Ortega (1989). “Orthogonal Reduction on Vector Com- 
puters,” SIAM J. Sci. and Stat. Comp. 10, 372-381, 

P.A. Knight (1995). “Fast Rectangular Matrix Multiplication and the QR Decompoai- 
tion,” Lin. Alg. ond Its Applic. 281, 69-81. 


5.3 The Full Rank LS Problem 


Consider the problem of finding a vector z € IR” such that Az = b where 
the data matriz A € R™*" and the observation vector b € R” are given and 
m > n. When there are more equations than unknowns, we say that the 
system Az = b is overdetermined. Usually an overdetermined system has 
no exact solution since b must be an element of ran(A), a proper suhepace 
of R”. 

This suggests that we strive to minimize || Ax — ||, for some suitable 
Choice of p. Different norms render different optimum solutions. For exam- 
ple, if A = [1, 1, 1|T and b = (5, bz, b3 ]T with bj > bz > 55 > 0, then it 
can be verified that 


p = 1 > Ip = bh 
p = 2 2 Io = (bhtb-5)3 
P= © > zu = (by +63)/2. 


Minimization in the 1-norm and oo -norm is complicated by the fact that 
the function f(z) = || Az.— ||, is not differentiable for these values of 
p. However, much progress has been made in this area, and there are 
several good techniques available for 1-norm and oo-norm minimization. 
See Coleman and Li (1992), Li (1993), and Zhang (1993). 

In contrast to general p-norm minimization, the least squares (LS) prob- 
lem 


min || Ax —5 |; (5.3.1) 
zeR^ 


is more tractable for two reasons: 


* (z) = }|| Az — b||} is a differentiable function of z and so the min- 
imizers of ¢ satisfy the gradient equation V¢(r)= 0. This turns out 
to be an easily constructed symmetric linear system which is positive 
definite if A has full column rank. 


5.3. THE FULL RANK LS PROBLEM 237 


* The 2-norm is preserved under orthogonal transformation. This means 
that we can seek an orthogonal Q such that the equivalent problem 
of minimizing || (QT A)z ~ (QT) || is “easy” to solve. 


In this section we pursue these two solution approaches for the case when 
A has full column rank. Methods based on normal equations and the QR 
factorization are detailed and compared. 


5.3.1 Implications of Full Rank 
Suppose z € R^, z € IR" , and a € R and consider the equality 


| A(z taz) - bl} = || Az - 5l + 2azT AT (Az - b) c a?|| Az |3 


where A € R™*" and bc R™. If z solves the LS problem (5.3.1) then 
we must have AT(Az — b) = 0. Otherwise, if z = —AT(Axz - b) and 
we make a small enough, then we obtain the contradictory inequallty 
|| A(z + az) - b lla < | Az—b||j. We may also conclude that if r and 
T +az are LS minimizers, then z € null(A). 

Thus, if A has full column rank, then there is a unique LS solution zs 
and it solves the symmetric positive definite linear system 


AT Azis = ATb. 


These are called the normal equations. Since V¢(z) = AT(Ax — b) where 
é(z) = 4|| Az -b ||} , we see that solving the normal equations is tanta- 
mount to solving the gradient equation V¢ = 0. We call 


TLS = b- Azis 
the minimum residual and we use the notation 
pts = || Arts — lla 


to denote its size. Note that if pzs is small, then we can “predict” b with 
the columns of A. 

So far we have been assuming that A € IR™*" has full column rank. 
This assumption is dropped in §5.5. However, even if rank(A) = n, then 
we can expect trouble in the above procedures if A is nearly rank deficient. 

When assessing the quality of a computed LS solution 2,5, there are 
two important issues to bear in mind: 


* How close is 255 to ILs? 


© How small is 7,5 =b- AZ; s compared to rgs = b — Ázrs? 
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The relative importance of these two criteria varies from application to 
application. In any case it is important to understand how Tzs and ris 
are affected by perturbations in A and b. Our intuition tells us that if 
the columns of A are nearly dependent, then these quantities may be quite 
sensitive. ` 


Example 5.3.1 Suppose 


1 0 0 0 1 0 
A-|o 105 | ,6A=| 0 0 ,b=]0],6=[ 0], 
0 0 0 1075 1 0 


and that rzs and 2, minimize || Az — b fj; and || (A+ Sale VQ t 0 llz respectively. 
Let rzs and 7$ be the corresponding minimum residuals. 


1 1 M N 2 
zs =| ] as-[ . ]- o |, pg =| 9999-107? | . 
0 -9999 - 10* 1 .9999 - 10° 
Since «2(A)= 10° we have 
lus —22s lla . gg99 104 < mala)?! 5Alg .. 1012.1078 
ll zzs liz 1A la 
and 


l#ns—resle ray, x7 seit $Ala Z 195.1078. 
“Tell, lAl 


The example suggests that the sensitivity of z;s depends upon x2(A)?. At 
the end of this section we develop a perturbation theory for the LS problem 
and the x;( A)? factor will return. 


5.3.3 The Method of Normal Equations 


The most widely used method for solving the full rank LS problem is the 
method of normal equations. 


Algorithm 5.3.1 (Normal Equations) Given A € IR" with the prop- 
erty that rank(A) = n and b € R”, this algorithm computes the solution 
Zig to the LS problem min || Az — b ||; where b € R”. 


Compute the lower triangular portion of C = AT A. 
d= AT) 

Compute the Cholesky factorization C = GGT. 
Solve Gy = d and GTzzs = y. 


This algorithm requires (m + n/3)n? flops. The normal equation approach 
is convenient because it relies on standard algorithms: Cholesky factoriza- 
tion, matrix-matrix multiplication, and matrix-vector multiplication. The 
compression of the m-by-n data matrix A into the (typically) much smaller 
n-by-n cross-product matrix C is attractive. 
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Let us consider the accuracy of the computed normal equations solution 
£s. For clarity, assume that no roundoff errors occur during the formation 
of C = AT A and d = ATb. (On many computers inner products are accu- 
mulated in double precision and so this is not a terribly unfair assumption.) 
It follows from what we know about the roundoff properties of the Cholesky 
factorization (cf. $4.2.7) that 


(ATA + E)żrs = AT, 
where || E |]. = ull AT [I] Allg = ull ATA ||; and thus we can expect 
ll 21s -zts Ma T 2 
pa se wl ATA) = us A). 5.3.2 
rk (ATA) = una) (5.3.2) 


In other words, the accuracy of the computed normal equations solution 
depends on the square of the condition. This seems to be consistent with 
Example 5.3.1 but more refined comments follow in §5.3.9. 


Example 5.3.2 It should he noted that the formation of A7 A can result in a severe 
loss of information. 


1 1 2 
A= 1073 0 andb = 10-3 
0 1073 1073 
then x2(A) = 14 - 109, tz = |1 1|T, and prs — 0. If tbe normal squations method is 


executed with base 10, t = 6 arithmetic, then a divide-by-zero occurs during the solution 
process, since 


ra fil 
ATA) = [ ii] 
is exactly singular. On the other hand, if 7-digit,arithmetic is used, then $,s = 
[2.000001 , 0]7 and fêrs — zrs llus lla rana CAY". 


5.3.3 LS Solution Via QR Factorization 
Let A € R™*" with m > n and b € IR" be given and suppose that an 


orthogonal matrix Q € R”*™ has been computed such that 
R n 
TA- R= 1 
QrA=R= | a] m-n (5.3.3) 
is upper triangular. If 


c n 
Qb= [z] m-n 


| Ax — 53 =IQ7Az- Q7} =| Fuz—cld +d 
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for any z € IR". Clearly, if rank(A) = rank(R,) = n, then zz is defined 
by the upper triangular system R,zps = c. Note that 


prs = || d lz- 


We conclude that the full rank LS problem can be readily solved once we 
have computed the QR factorization of A. Details depend on the exact QR 
procedure. If Householder matrices are used and QT is applied in factored 
form to b, then we obtain 


Algorithm 5.3.2 (Householder LS Solution) If A € IR7*" has full 
column rank and b € IR™, then the following algorithm computes a vector 
zs € R” such that || Arps —5 ||; is minimum. 


Use Algorithm 5.2.1 to overwrite A with its QR factorization. 
for j = lin 

v(j) = 1; v(j + m) = AG + Lm, j) 

b(j:m) = (Im-j+1 — jov? (zm) 
end 


Solve R(1:n, lin)zzs = b(1:n) using back substitution. 


This method for solving the full rank LS problem requires 2n?(m — n/3) 
flops. The O(mn) flops associated with the updating of b and the O(n?) 
flops associated with the back substitution are not significant compared to 
the work required to factor A. 

It can be shown that the computed 275 solves 


min|| (A + 6A)z - (b + 4) ||; (5.3.4) 
where 
6A] p < (6m—3n + 41)nu| A | e + O(n?) (5.3.5) 
and 
|| & J, < (6m — 3n + 40)nul] b ||], + O(u?). (5.3.6) 


These inequalities are established in Lawson and Hanson (1974, p.908) and 
show that zs satisfies a “nearby” LS problem. (We cannot address the 
relative error in £ps without an LS perturbation theory, to be discussed 
shortly.) We mention that similar results hold if Givens QR is used. 


5.3.4 Breakdown in Near-Rank Deficient Case 


Like the method of normal equations, the Householder method for solving 
the LS problem breaks down in the back substitution phase if rank(A) « n. 
Numerically, trouble can be expected whenever x2(A) = «2(R) = 1/u. 
This is in contrast to the normal equations approach, where completion 
of the Cholesky factorization becomes problematical once (A) is in the 
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neighborhood of 1/,/u. (See Example 5.3.2.) Hence the claim in Lawson 
and Hanson (1974, 126-127) that for a fixed machine precision, a wider 
cless of LS problems can be solved using Householder orthogonalization. 


5.3.5 A Note on the MGS Approach 


In principle, MGS computes the thin QR factorization A = Qı Rı. This is 
enough to solve the full rank LS problem because it transforms the normal 
equations (AT A)z = AT) to the upper triangular system Riz = QTb. 
But an analysis of this approach when QTb is explicitly formed intro- 
duces a x2(A)? term. This is because the computed factor Q, satisfies 
J| QTQ; — Jn ||; = ura ( A) as we mentioned in 85.2.9. 

However, if MGS is applied to the auginented matrix 


Ay - [40] - IQ anos] 7 AF 


then z = QTh. Computing QTb in this fashion and solving Rizis = z 
produces an LS solution 2z5 that is “just as good” as the Householder QR 
method. That is to say, a result of the form (5.3.4)-(5.3.6) applies. See 
Bjórck and Paige (1992). 

It should be noted that the MGS method is slightly more expensive 
than Householder QR because it always manipulates m-vectors whereas 
the latter procedure deals with ever shorter vectors. 


5.3.6 Fast Givens LS Solver 


The LS problem can also be solved using fast Givens transformations. Sup- 
pose MT M = D is diagonal and 


TA 5 n 
MA= [S] mta 


c n 
d| m-n 
then 


2 
ETE EO) 
2 


for any z € R”. Clearly, zrs is obtained by solving the nonsingular upper 
triangular system Siz = c. 

The computed solution 27,5 obtained in this fashion can be shown to 
solve a nearby LS problem in the sense of (5.3.4)-(5.3.6). This may seem 


is upper triangular. If 


MTb 


242 CHAPTER 5. ORTHOGONALIZATION AND LEAST SQUARES 


surprising since large numbers can arise during the calculation. An entry 
in the scaling matrix D can double in magnitude after a single fast Givens 
update. However, largeness in D must be exactly compensated for by large- 
ness in M, since D-1/2M is orthogonal at all stages of the computation. 
It is this phenomenon that enables one to push through a favorable error 
analysis. 


5.3.7 The Sensitivity of the LS Problem 


We now develop a perturbation theory that assists in the comparison of 

the normal equations and QR approaches to the LS problem. The theorem 

below examines how the LS solution and its residual are affected by changes 

in A and 5. In so doing, the condition of the LS problem is identified. 
Two easily established facts are required in the analysis: 


Allg (ATA) AT |], = sA) 


(5.3.7) 
WAR ITA) Il; = (A)? 
These equations can be verified using the SVD. 
Theorem 5.3.1 Suppose T, r, 2, and ° satisfy 
|Az-b]|l = min r=b- Ar 
I (A+ 54)2 — (6 + 66) ll} = min # = (b + 8b) ~ (A + 64) 


where A and 6A are in R™*" with m > n and 0 X b and ôb are in R™. If 


e= mex ü Ah I op} < lA) 


I Als ' Wèl ex (A) 
and 
sin(9) = [TTA #1 

where prs = || Arts — b ||z, then 

J2-zl 2«4(A) 2 

IT aha 2 

Tz Ih <e (65 + tan(8)«4(A) ) + Ole) (5.3.8) 
V£—rla < 6(1 4 269(A)) min(1,m—n) + Ole). (5.3.9) 


Iè Ms 
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Proof. Let E and f be defined by E = 6A/e and f = b/e. By hypothesis 
|| 6A lz < on(A) and so by Theorem 2.5.2 we have rank(A + £E) = n for 
all ¢ € [0, e]. It follows that the solution z() to 


(A £E)T(A -:E)z(f) = (A+tE)7(b4 tf) (5.3.10) 
is continuously differentiable for all t € [0,«]. Since z = z(0) and $ = z(e), 
we have 
$-z + (0) + O(€). 


The assumptions b # 0 and sin(8) # 1 ensure that z is nonzero and so 


lê-zl: | lOl | oca 
Fh dh * O°): (5.3.11) 


In order to bound || £(0) ||; we differentiate (5.3.10) and set ¢ = 0 in the 
result, This gives 


ET Az + ATEx + AT Aż(0) = ATS 4+E7 


ie, 
i(0) = (ATA)! AT(f — Ex) + (ATA) ETr, (5.3.12) 


By substituting this result into (5.3.11), taking norms, and using the easily 
verified inequalities || f || < 15 ||; and || E ||; € || All; we obtain 


l3 - zll ( Ta-t Ibl 
L— 2 < ell Alal (A74)! AT — 52 4] 
lle 14M (A 4) h HA iall z Ile 


_ PLS _ TAa à 
* Tatty! 4 W474) la} + oe). 


Since AT(Az — b) = 0, Az is orthogonal to Az — b and so 
lb- Aziid + (| Ax I$ = [15113 - 


Thus, 
I Ali 213 zl -ts 
and so by using (5.3.7) 


lz], < »sin(8) 2 
Izh * s efra (5 * 1) + m) EO + Oe) 


thereby establishing (5.3.8). 
To prove (5.3.9), we define the differentiable vector function r(t) by 


r(t) = (b - tf) - (A + £E)z(t) 
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and observe that r = r(0) and f = r(e). Using (5.3.12) it can be shown 
that 
#(0) = (I- A(AT A)-1 AT) (f - Ez) - A(AT A)? ETr. 


Since | ? — r ||; = ell #(0) ]| + O(c?) we have 


lê-rl . DA + ote) 
I lh b lla 


€ (i I- A(AT A)714AT ll; b + I Ens t) 
2 


+ I A(ATA)™? lall A Io oe iei a + O(c?) 


IA 


Inequality (5.3.9) now follows because 
I AWoliz ll; = HA lal A*+b iz S s2(A)I15 I, 


Pus = | (I - (ATA)? AT)o]l < I- ACAT A)! AT lald lla, 


and 
I (I — A(AT 4) A7 ||; = min(m — n, 1). 0 


An interesting feature of the upper bound in (5.3.8) is the factor 


Pr A} = —— E n(A}. 
int A 7 qr O 


Thus, in nonzero residual problems it is the square of the condition that 
measures the sensitivity of zrs. In contrast, residual sensitivity depends 
just linearly on «2(A). These dependencies are confirmed by Example 5.3.1. 


5.3.8 Normal Equations Versus QR. 


It is instructive to compare the normal equation and QR approaches to the 
LS problem. Recall the following main points from our discussion: 


* The sensitivity of the LS solution is roughly proportional to the quan- 
tity 2(A) + pisxa(A)’. 


* The method of normal equations produces an 2,5 whose relative error 
depends on the square of the condition. 


* The QR approach (Householder, Givens, careful MGS) solves a nearby 
LS problem and therefore produces a solution that has a relative error 
approximately given by u(mo(A)  psex( A)?). 
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Thus, we may conclude that if prs is small and «2(A) is large, then the 
method of normal equations does not solve a nearby problem and will usu- 
ally render an LS solution that is less accurate than a stable QR approach. 
Conversely, the two methods produce comparably inaccurate results when 
applied to large residual, ill-conditioned problems. 

Finally, we mention two other factors that figure in the debate about 
QR versus normal equations: 


* The normal equations approach involves about half of the arithmetic 
when m >> n and does not require as much storage. 


* QR approaches are applicable to a wider class of matrices because 
the Cholesky process applied to AT A breaks down "before" the back 
substitution process on QT A = R. 


At the very minimum, this discussion should convince you how difficult it 
can be to choose the "right" algorithm! 


Problems 


P5.9.1 Assume AT Az = ATb, (AT A + F)z = ATS, and 2| F la < ex(A)?. Show that 
ifr =b- Az and f£ =b - Aż, then f — r = A(AT A + F)7! Fe and 


- F | 
1è-rla s PE ile 


P5.3.2 Assume that ATAz = ATS and that ATA = ATS + f where | / |], < 
cull AT izil 5|; and A has full column rank. Show that 


T 
12-2 cup LAT Mal Ol 
Wai, $0 TTG 


P5.3.3 Let A € R™*" with m > n and y € R” and define A = [A y] € E*(n*U, 
Show that a; (A) > a (A) and on41(A) € e (A). Thus, the condition grows if s column 
is edded to & matrix. 


P5.3.4 Let A € E *^ (m >n), w € R^, and define 
A 
s(t] 


Show that an(B) 2 on(4) and eB) < vil 4 + ilw (|f. Thus, the condition of a 
matrix may increase or decrease if & row is added. 

P5.3.5 (Cline 1973) Suppose that A € RX" han rank n and that Gaussian elimination 
with partial pivoting is used to compute the factorization PA = LU, where L € E^ X^ is 
unit lower triangular, U € R°™? is upper triangular, and P € R”X™ is a permutation. 
Explain how tbe decomposition in P5,2.5 can be wed to find a vector z € R?” such that 
l| Lz — Pb]; ia minimized, Show that if Uz = z, then || Az — b |j} is minimum. Show 
that this method of solving the LS problem is more efficient than Housebolder QR from 
the flop point of view whenever m < 5n/3. 

P5.3.6 The matrix C = (AT A)-!, where rank(A) = n, arises in many statistical appli- 
cations and is known as the variance-covoriance matriz. Assume that tbe factorization 
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A= QR is available. (s) Show C =(#7R)—!. (b) Give an algorithm for computing the 
diagonal of C that requires n?/3 flops. (c) Show that 


a v 
0 S 


where Ci = (ST S)-!. (d) Using (c), give an algorithm that overwrites the upper tri- 
angular portion of with tbe upper triangular portion of C. Your algorithm should 
require 2n3/3 flops. 

P5.3.7 Suppose A € E" is symmetric and that r = b — Az where r, b, z € R” and 
z is nonzero. Show how to compute a symmetric E € R^*" with minimal Frobenius 
norm so thet (A + E)r = b. Hint. Use the QR (actorization of (z, 7] and note that 
Exar = (QTEQ)(Q?2) = Q?r. 

P5.3.8 Show how to compute the nearest circulant matrix to a given Toeplitz matrix. 
Measure distance with the Frobenius norm. 


Re (l¢eTCyv)/a? TC /a ] 


= To- = 
> € = (ATR) =Í “ona hs 
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5.4 Other Orthogonal Factorizations 


If A is rank deficient, then the QR factorization need not give a basis for 
ran(A). This problem can be corrected by computing the QR factorization 
of a column-permuted version of A, i.e., AII = QR where II is a permuta- 
tion. 

The “data” in A can he compressed further if we permit right multipli- 
cation by a general orthogonal matrix Z: 


QTAZ =T. 


There are interesting choices for Q and Z and these, together with the 
column pivoted QR factorization, are discussed in this section. 


5.41 Rank Deficiency: QR with Column Pivoting 


If A € R"*" and rank(A) < n, then the QR factorization does not nec- 
essarily produce an orthonormal basis for ran(A). For example, if A has 
three columns and 


is its QR factorization, then rank(A) = 2 but ran( A) does not equal any of 
the subspaces span{qi, q2), span(qi, qs), or span(qa, g3}- 

Fortunately, the Householder QR factorization procedure (Algorithm 
5.2.1) can be modified in a simple way to produce an orthonormal basis for 
ran(A). The modified algorithm computes the factorization 


Rn R r 
TAT] = n R2 
Q'An = l 0 0 | m-r (5.4.1) 


r n—-r 


ee 


11 
A = [a az, a3] = (m1, 92,93] | 0 0 
0 0 


where r = rank(A), Q is orthogonal, Ri, is upper triangular and non- 
singular, and II is & permutation. If we heve the column partitionings 
All = [a4,...,24, | and Q = [41,..., 45 ], then for k = 1:n we have 


min (rk) 
aa = >> rae € spenín,.... 9) 
i=l 
implying 
ran(A) = span{qı,...,qr}- 
The matrices Q and II are products of Householder matrices and inter- 
change matrices respectively. Assume for some k that we heve computed 
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Householder matrices Hi,..., Hy , and permutations IT,,...,IT, , such 
thet 


(Hi H) AM ID) = (5.4.2) 
gea. [sth seh] ren 
o REDI m-k&l 
k-1 n-k41 


where Ri D isa nonsingular and upper triangular matrix. Now suppose 


thet 
(k-1) [4979 eg] 
is a column partitioning and let p > k be the smallest index such that 
1$ la = max fil zh use t? Ila} (5.4.3) 


Note that i£ k~1 = rank( A), then this maximum is zero and we are finished. 
Otherwise, let II, be the n-by-n identity with columns p and k interchanged 
and determine a Householder matrix H, such that i£ RU) = H, R*-UJI,, 
then RP (k + 1:5, k) — 0. In other words, II, moves the largest column in 
Re D tothe lead position and Hy zeroes all of its subdiagonal components. 

The column norms do not have to he recomputed at each stage if we 
exploit the property 


a 1 
Qu. [2] s-1 = lel =z} - 2, 
which holds for any orthogonal matrix Q € IR'*. This reduces the overhead 
associated with column pivoting from O(mn?) flops to O(mn) flops because 
we can get the new column norms by updating the old column norms, e.g., 


HO = 1:67 -rk 


Combining all of the above we obtain the following algorithm established 
by Businger and Golub (1965): 


Algorithm 5.4.1 (Househoider QR. With Column Pivoting) Given 
A € R"*^ with m > n, the following algorithm computes r = rank(A) 
and the factorization (5.4.1) with Q = H,---H, and II = Ih ---Il. The 
upper triangular part of A is overwritten by the upper triangular part of 
R and components j + 1:m of the jth Householder vector are stored in 
A(j + 1:m, j). The permutation II is encoded in an integer vector piv. In 
particular, II; is the identity with rows j and piv(7) interchanged. 
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for j = lin 
ej) = A(l:m, j)T AQm, j) 
end 
r 20; r= max(c(1),...,e(n)) 
Find smallest k with 1 < k < n so c(k) =T 
while 7 > 0 
r-r4l 
piv(r) = k; A(1:m,r) e A(1:m, k); e(r) e c(k) 
(v, 6] = house(A(r:m,r)) 
A(r:m, rin) = (1 41 — BvvT )A(r:m, r:n) 
A(r + m,r) = v(Zim — r +1) 
for i=r+ lin 


eli) = efi) — Ar, iy 


end 
ifr «n 
T = max(c(r + 1),...,c(n)) 
Find smallest k with r +1 < k < n soc(k) =r. 
else 
T=0 
end 


end 


This algorithm requires 4mnr—2r3(m+n) 4-4r?/3 flops where r = rank( A). 
As with the nonpivoting procedure, Algorithm 5.2.1, the orthogonal matrix 
Q 1s stored in factored form In the subdiagonal portion of A. 


Example 5.4.1 If Algorithm 5.4.1 is applied to 


12 3 

15 6 
A= 18 9 }? 

1 ou i 


then IT = [es eg ei] and to three significant digits we obtain 


—182 -.816  .514 191 
L5 408 Ife -14600 —1.820 


—.827 .129 
AH = QR = 0.0 816 -.816 
| -548 -000 M3 —.829 0.0 “000 0.000 


—.730 408 -200 510 


5.4.2 Complete Orthogonal Decompositions 


The matrix R produced by Algorithm 5.4.1 can be further reduced if it 
is post-multiplied by an appropriate sequence of Householder matrices. In 
particular, we can use Algorithm 5.2.1 to compute 


Rh Tor 
nnm - | &] M (5.4.4) 
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where the Z; are Householder transformations and TT, is upper triangular. 
It then follows that 
Ti 0 r 
T =T= n 
QAZ =T | 0 0 | m-r 
r n—-r 


(5.4.5) 


where Z = IIZ, --- Z,. We refer to any decomposition of this form as a com- 
plete orthogonal decomposition. Note that null(A) = ran(Z(1:n,r + 1:n)). 
See P5.2.5 for details about the exploitation of structure in (5.4.4). 


5.4.8 Bidiagonalization 


Suppose A € R™*" and m > n. We next show how to compute orthogonal 
Ug (m-by-m) and Vg (n-by-n) such that 


d h 0 e 0 
0 d h 0 

URAVe = | 0 «- dnt Fai |- (5.4.6) 
0 0 dy 


Ug =U,---Un and Vg = V,--- V4.2 can each be determined as a product 
of Householder matrices: 


X X X X X X X Xx 

X X X x 0 x x x 

x xx x||oxxx]|-5 

X X X x 0 x x x 

X X X x 0 x x x 

x x 0 0 x x 00 

0 x x x 0 x x x 

ox xxllooxx|-& 

0 x x x 0 0 x x 

0 x x x 0 0 x x 
x x 0 0 x x 0 0 x x 00 
0 x x 0 0 x x 0 0 x x 0 
00x x|-|ooxx|2|ooxx 
0 0 x x 0 0 0 x 0.00 x 
0 0 x x 0 0 0 x 0000 
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In general, U, introduces zeros into the kth column, while V, zeros the 
appropriate entries in row k. Overall we have: 


Algorithm 5.4.2 (Householder Bidiagonalization) Given A € R™*" 
with m > n, the following algorithm overwrites A with UT AVs = B where 
B is upper bidiagonal and Ug = U,---U, and Vg = Vi---V4 2. The 
essential part of U;'s Householder vector is stored in A(j + 1:m, j) and the 
essential part of V;’s Householder vector is stored in A(j,j + 2:7). 


for j = l:n 
[v 6] = house(A(j:m, j)) 
Am, jn) = (Im. jot - B00" )ACjem, jn) 
A(j + lim, j) = v(Z:m ~ j + 1) 
ifjzn-2 
(», 6] = house(AG, j + 1n)7) 
A(J:m, j + En) = A(j:m, j + E:n)(Ia.., ~ BvvT) 
A(j, j + 2:n) = v(2:n ~ j)7 
end 
end 


This algorithm requires 4mn? — 4n3/3 flops. Such a technique is used in 
Golub and Kahan (1965), where bidiagonalisation is first described. If the 
matrices Ug and Vg are explicitiy desired, then they can be accumulated 
in 4m?n — 4n?/3 and 4n3/3 flops, respectively. The bidiagonalization of A 
is related to the tridiagonalization of AT A. See 58.2.1. 


Example 5.4.2 If Algorithm 5.4.2 is applied to 


12 3 

4 5 6 
A=| 7 s 9 

10 11 12 


then to three significant digits we obtain 


. 28 ns Len . 100 000 000 
B= 0 ^0 0 Ve =| 0.00 -.657 -—.745 
0 0 o 0.00 —745  .567 

—.0776 ~.833  .392 -.383 

Dp =| ^10 —491 ~. 802 

5 -.5430 -.089 701 —.457 

—TI60 312 —.54T — 087 


5.4.4  R-Bidiagonalization 


A faster method of bidiagonalizing when m `> n results if we upper trian- 
gularize A first before applying Algorithm 5.4.2. In particular, suppose we 
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compute an orthogonal Q € R™*™ such that 
R 
TAL 1 
ea- [1] 
is upper triangular. We then bidiagonalize the square matrix Fi, 
UIRVag-B,. 


Here Ug and Vg are n-by-n orthogonal and B, is n-by-n upper bidiagonal. 
If Ug = Q diag (Un, I4) then 


utav - |^ ] «5 


is a bidiagonalization of A. 

The idea of computing the bidiagonalization in this manner is mentioned 
in Lawson and Hanson (1974, p.119) and more fully analyzed in Chan 
(1982a). We refer to this method as R-bidiagonalization. By comparing Its 
flop count (2mn?+42n3) with that for Algorithm 5.4.2 (4mn?—4n3/3) we see 
thet it involves fewer computations (approximately) whenever m > 5n/3. 


5.4.5 The SVD and its Computation 


Once the bidiagonalization of A has been achieved, the next step in the 
Golub-Reinsch SVD algorithm is to zero the superdiagonal elements in B. 
This is an iterative process and is accomplished by an algorithm due to 
Golub and Kaban (1965). Unfortunately, we must defer our discussion of 
this iteration until §8.6 as it requires an understanding of the symmetric 
eigenvalue problem. Suffice it to say here that it computes orthogonal 
matrices Ug and Vg such thet 


UZ BV, = E = diag(on,..-,0n) € R™. 


By defining U = UgUg and V = VgVg we see that UT AV = © is the SVD 
of A. The flop counts associated with this portion of the algorithm depend 
upon “how much” of the SVD is required. For example, when solving the 
LS problem, UT need never be explicitly formed but merely applied to b 
as it is developed. In other applications, only the matrix U, = U(:, I:n) 
is required. Altogether there are six possibilities and the total amount of 
work required by the SVD algorithm in each case is summarized in the 
table below. Because of the two possible bidiagonalization schemes, there 
are two columns of flop counts. If the bidiagonalization is achieved via 
Algorithm 5.4.2, the Golub-Reinsch (1970) SVD algorithm results, while if 
R-bidiagonalization is invoked we obtain the R-SVD algorithm detailed in 
Chan (1982a). By comparing the entries in this table (which are meant only 
as approximate estimates of work), we conclude that the R-SVD approach 
is more efficient unless m ~ n. 
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aD | A 
4mn? ~ 4n3/3 2mn? + 20? 
4mn? + 85? 2mn? + 11n? 


4m?n — 8mn? 4m?n + 135? 
14mn? — 2n? 6mn? + 110? 
Am?n + 8mn? + 9n5 | 4m?n + 225? 
14mn? + 8n? 6mn? + 20n3 


Problems 


P5.4.1 Suppose A € R™*" with m <n Give an algorithm for computing the factor- 
ization 
UTAV « [BO] 

where B is an m-by-m upper bidiagonal matrix. (Hint: Obtain the form 

0 0 
0 0 
x o0 
X x 


0 0 
x 0 
x 0 
0 0 


ooox 
OOXxXXxX 


using Householder matrices and then “chase” tbe (m,m + 1) entry up the (m + 1)st 
column by applying Glvens rotations from the right.) 

P5.4.2 Show how to efficiently bidiagonalize an n-by-n upper triangular matrix using 
Givens rotations. 

P5.4.3 Show how to upper bidiagonalize a tridiagonal matrix T € R”*® using Givens 
rotations. 

P5.4.4 Let A € R™*" and assume that 0 # v satisfies |] Av |l = os (A) vlla Let II 
be a permutation such that if IITv = w, then jwa} = || wiles. Show that if AI = QR 
is the QR factorization of AI, then [rnal < /Ron(A). Thus, there always exists a 
permutation II such that the QR factorization of AII "displays" near rank deficiency. 
P5.4.5 Let z,y € R” and Q € R™*™ be given with Q orthogonal. Show that if 


Qs = [z] m-i y= [4] m-i 
then uTv = zT y — af. 


P5.46 Let A = [at,...,¢n] € E"*" and bE E^ be given. For any subset of A's 
columns (a.,,..., ac, ) define 


ree (05,,...,2e, ] = Min ff facr... sae ]x bla 
zeR* 


Describe an alternative pivot selection procedure for Algorithm 5.4.1 such that if QR = 
AN = [a«,..., ac, ] in tbe final factorization, then for k = Ln: 


rea[ a5... s Gep ] = min Tesfdes y+. s Begg: ac] 


Notes and References for Sec. 5.4 
Aspects of the complete orthogonal decomposition are discussed in 
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R.J. Hanson and C.L. Lawson (1969). “Extensions and Applications of the Householder 
Algorithm for Solving Linear Least Square Problems,” Math. Comp. 23, 787-812. 
P.A. Wedin (1973). “On the Almost Rank-Deficient Case of the Least Squares Problem,” 
BIT 13, 344-54. 

G.H. Golnb and V. Pereyra (1976). “Differentiation of Pseudo-Inverses, Separable Non- 
linear Least Squares Problems and Other Tales,” in Generalized Inverses and Appli- 
cations , ed. M.Z. Neshed, Academic Press, New York, pp. 303-24. 


The computation of the SVD is detailed in §8.6. But here are some of the standard 
references concerned with ita calculation: 


G.H. Golub and W. Kahan (1965). “Calculating the Singular Values and Peeudo-Inverse 
of a Matrix," SIAM J. Num. Anal. 2, 205-24. 

P.A. Businger and G.H. Golub (1989). “Algorithm 358: Singular Value Decomposition 
of the Complex Matrix,” Comm, ACM 12, 564-65. 

G.H. Golub and C. Reinach (1970). “Singular Value Decomposition and Least Squares 
Solutions,” Numer, Math. 14, 403-20, See also Wilkinson and Reinsch(1971, pp. 
1334-51). 

T.F. Chan (1982). “An Improved Algorithm for Computing the Singular Value Decom- 
position,” ACM Trans. Math. Soft. 8, 72-83. 


QR with column pivoting was first discussed in 


PA. Businger and G.H. Golub (1965). “Linear Least Squares Solutions by Housebolder 
Transformations,” Numer, Math. 7, 269-76, See also Wilkinson and Reinach (1971, 
pp. 11-18). 


Knowing when to stop in the algorithm is difficult, In questions of rank deficiency, it is 
belpful to obtain information about the smallest singular value of the upper triangular 
matrix H. This can be done using the techniques of 53.5.4 or those that ere discussed in 


I. Karasalo (1974). “A Criterion for Truncation of the QR Decomposition Algorithm for 
the Singular Linear Least Squares Problem,” BIT 14, 156-66. 

N. Anderson and I. Karasalo (1975). “On Computing Bounds for the Least Singular 
Value of a Triangular Matrix,” BIT 15, 1-4. 


Other aspects of rank éstimation with QR are discussed in 


L.V. Foster (1988). “Rank and Null Space Calculations Using Matrix Decomposition 
without Column Interchanges,” Lin. Alg. and Its Applic. 74, 47-71. 

T.F. Chan (1987), “Rank Revealing QR Pactorizations,” Lin. Alg. ond Its Applic. 
88/89, 67-82. 

T.F. Chan and P. Hansen (1992). “Some Applications of the Rank Revealing QR Fac- 
torization,” SIAM J, Sci. and Stat. Comp. 13, 721-741. 

J.L. Barlow and U.B. Vemulapati (1992). “Rank Detection Methods for Sparse Matri- 
ces," SIAM J. Matriz. Anal. Appi. 13, 1279-1297. 

T-M. Hwang, W-W. Lin, and E.K. Yang (1992). “Rank-Revealing LU Factorizations," 
Lin. Alg. and Its Applic. 175, 115-141. 

C.H. Bischof and P.C. Hansen (1992). “A Block Algorithm for Computing Rank- 
Revealing QR Factorixationg," Numerical Aigorithma £, 371-392. 

S. Chandrasekaren and LC.F. Ipsen (1994). “On Rank-Revealing Factorizations,” SIAM 
J, Matrix Anal. Appl. 15, 592-622. 

RD. Fierro and P.C. Hansen (1995). “Accuracy of TSVD Solutions Computed from 
Rank-Revealing Decompositions,” Numer, Math. 70, 452-472. 


256 CHAPTER 5. ORTHOGONALIZATION AND LEAST SQUARES 


5.5 The Rank Deficient LS Problem 


If A is rank deficient, then there are an infinite number of solutions to the 
LS problem and we must resort to special techniques. These techniques 
must address the difficult problem of numerical rank determination. 

After some SVD preliminaries, we show how QR with column pivoting 
can be used to determine a minimizer zg with the property that Arg is a 
linear combination of r = rank(A) columns. We then discuss the minimum 
2-norm solution that can be obtained from the SVD. 


5.5.1 The Minimum Norm Solution 


Suppose A € R™" and rank(A) =r < n. The rank deficient LS problem 
has an infinite number of solutions, for if z is a minimizer and z € null( 4) 
then z + z is also a minimizer. The set of all minimizers 
X-í(ÍzemR':[Ar—bla-min) 
is convex, for if zj, 24 € X and A € (0, 1], then 
|| Ai + (172)22) - big S. M Ati -b lla + (1 7 AD Aza - b liz 
= min|| Az- bli. 
Thus, Az; + (1~A)z2 € X. It follows that X has a unique element having 
minimum 2-norm and we denote this solution by zzs. (Note that in the 
full rank case, there is only one LS solution and so it must have minimal 
2-norm. Thus, we are consistent with the notation in §5.3.) 


5.5.2 Complete Orthogonal Factorization and xis 


Any complete orthogonal factorization can be used to compute zz5. In 
particular, if Q and Z are orthogonal matrices such that 
Ti 0 T 
T 2T. u 
Q AZ =T = | 0 o | m-r 


r n-r 


r= rank(A) 


then 
|| Az - 5l = IK (QTAZ)ZTz- QTI? — li Tw - eld - dli 


where 


T. w r Tp e r 
gr HES Qb = [z] m-r 


Clearly, if z is to minimize the sum of squares, then we must have w = T; c. 
For z to have minimal 2-norm, y must be zero, and thus, 


1 
aal], 
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5.5.3 The SVD and the LS Problem 


Of course, the SVD is & particularly revealing complete orthogonal de- 
composition. It provides a nest expression for Tzs and the norm of the 
minimum residual pzs = || Aris — b |a. 


Theorem 5.5.1 Suppose UT AV = X. is the SVD of A € R™*" with r = 
rank(A). IJU = [ui,..., t ] and V = [vm,..., Vn ] are column partition- 
ings and b € R”, then 
“utd 

īrs = —H (5.5.1) 

iml 

minimizes || Az ~ b ||; and has the smallest 2-norm of all minimizers. More- 
over m 

pls = || Azrs - 513 = $ (7b. (5.5.2) 


$-rRi 


Proof. For any z € R” we have: 


lAz-5là = |(UTAVY)VTz)-UTb|[ = || Za - UT | 
= See ub + V^ (us)? 
i=l ime $l 


where a = VT z. Clearly, if z solves the LS problem, then a; = (uTb/o) for 
i = Lr. If we set a(r + l:n) = 0, then the resulting z clearly has minimal 
2-norm. O 


5.5.4 The Pseudo-Inverse 
Note that if we define the matrix At € R**" by At = VE*tUT where 


1 1 

Et-di —,...,—,0,...,0} e ROX" = rank(A 

diag (4 - ) r — rank(A) 
then zrs = Atb and prs = || (I — AA*)b||a. A*t is referred to as the 
pseudo-inverse of A. It is the unique minimal Frobenius norm solution to 

the problem 
min || AX — Im lp - (5.5.3) 
X € mp» F 


If rank(A) = n, then At = (AT A)! AT, while if m = n = rank(A), then 
At = A^!, Typically, At is defined to be the unique matrix X € R°*™ 
that satisfies the four Moore-Penrose conditions: 


(i) AXA (ii) (AX)? 


=A AX 
(ii) XAX = X (v) (XA) 


XA. 
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These conditions amount, to the requirement that AA+ and At A be orthog- 
onal projections onto ran(A) and ran(AT), respectively. Indeed, AAt = 
UUT where U, = U(1:m, l:r) and AtA = Vi VË where V, = V(1:n, lr). 


5.5.5 Some Sensitivity Issues 


In $5.3 we examined the sensitivity of the full rank LS problem. The be- 
havior of zrs in this situation is summarized in Theorem 5.3.1. If we drop 
the full rank assumptions then zrs is not even a continuous function of the 
data and small changes in A and 5 can induce arbitrarily large changes in 
zis = Atb . The easiest way to see this is to consider the behavior of the 
pseudo inverse. If A and 6A are in R7", then Wedin (1973) and Stewart 
(1975) show that 


I (A+ 6A)* — A* le < 2l 8A |] pmax {I| At I3} , I (A+ 5A)* I3 }. 
This inequality is a generalization of Theorem 2.3.4 in which perturbations 


in the matrix inverse are bounded. However, unlike the square nonsingular 
case, the upper bound does not necessarily tend to zero as 6A tends to zero. 


then 


oo] and «cay s [1 0 o] 


1 lfe 0 

and || At — (A +6A)* ||2 = 1/e. The numerical determination of an LS 
minimizer in the presence of such discontinuities is a major challenge. 
5.5.6 QR with Column Pivoting and Basic Solutions 


Suppose A € R™*" has rank r. QR with column pivoting (Algorithm 5.4.1) 
produces the factorization AI] = QR where 


Ru Rn r 
0 0 m-r 
rT n-r 


Given this reduction, the LS problem can be readily solved. Indeed, for 
any z € E? we have 


fAz—-5|$ = fl (Q” AMT) —(Q7) jj 
= || Ruy ~(e~ Riaz) 3 + (aiid, 
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where 
To y r T c r 
Dz- [e] a and Qb = [s] «n. 
Thus, if z is an LS minimizer, then we must have 


:-n[ Ry (e — Razz) | . 
z 


If z is set to zero in this expression, then we obtain the basic solution 
-1 
za=n| ^j* | ` 


Notice that zg has at most r nonzero components and so Azg involves a 
subset of A’s columns. 

The basic solution is not the minimal 2-norm solution unless the sub- 
matrix R2 is zero since 


llzzsl2 = min 
ze RT 


(5.5.4) 


zg- | Ruf z 
Thr 2 


Indeed, this characterization of || zs ||; can be used to show 


l| zz lla f —1 
1$ .——-— « 14+] Ri ail . 5.5.5 
il zs li | Rī Aaa 3 (5.5.5) 


See Golub and Pereyra (1976) for details. 


5.5.7 Numerical Rank Determination with AII = QR 
If Algorithm 5.4.1 is used to compute zp, then care must he exercised in 
the determination of rank(A). In order to appreciate the difficulty of this, 
suppose 


` AQ RO k 
e e = k) — 
fH, H All, --- Tk) = RO = o s | m-k 
k n-k 


is the matrix computed after k steps of the algorithm have been executed 
in floating point. Suppose rank(A) = k. Because of roundoff error, Ê) 
will not be exactly zero. However, if 2) is suitably small in norm then it 
is reasonable to terminate the reduction and declare A to have rank k. A 
typical termination criteria might be 


l| £2 la S ell Alla (5.5.6) 
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for some small machine-dependent parameter c;. In view of the roundoff 
properties associated with Householder matrix computation (cf. $5.1.12), 
we know that A‘) is the exact R factor of a matrix A + Ey, where 


l| Exla Seal Ala — ex O(u). 
Using Theorem 2.5.2 we have 
ek (A+ Be) = oe (009) < || AY a. 
Since zx 41(4) € exi (A + Ex) + || Ex lla, it follows that 
on41(A) S (& + e3)l Alle. 


In other words, a relative perturbation of O(t; + €2) in A can yield a rank-k 
matrix. With this termination criterion, we conclude that QR with column 
Pivoting “discovers” rank degeneracy if in the course of the reduction RY 
is small for some k < n. 

Unfortunately, this is not always the case. À matrix can be nearly rank 
deficient without a single @% being particularly small. Thus, QR with 
column pivoting by itself is not entirely reliable as a method for detecting 
near rank deficiency. However, if a good condition estimator is applied to 
R ìt is practically impossible for near rank deficiency to go unnoticed. 


Example 5.5.1 Let Tn (c) be tbe matrix 


1 -c -c =c 

0 1 -c -=e 
Ta(c) = disg(1,3,...,3"7}) 

: 1 -c 

0 ave 1 


with c? + s? = 1 with c, s > 0 (See Lawson and Hanson (1974, p.31).) These matrices ere 
unaltered by Algorithm 5.4.1 and thus || RÍ) [a > s"—! for k = 1:n—1. This inequality 
implies (for example) that tbe matrix T199(.2) has no particularly small trailing principal 
submatrix since s* = 13. However, it can be shown that on = 0(1074). 


5.5.8 Numerical Rank and the SVD 


We now focus our attention on the ability of the SVD to handle rank- 
deficiency in the presence of roundoff. Recall that if A = UEVT is the 
SVD of A, then 


ILs = 2. 3p (5.5.7) 
where r = rank( A). Denote the computed versions of U, V, and £ = 
diag(o;) by U, V, and Ê = diag(é;). Assume that both sequences of singular 
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values range from largest to smallest. For a reasonably implemented SVD 
algorithm it can be shown that 


U=W+AU W'We=In  |AUlaxe (5.5.8) 
V2Z-AV  ZTZ«n  |AVlhs&e (5.5.9) 
Ê=WT(A+AA)Z  |AAlh S el Alla (5.5.10) 


where c is a small multiple of u, the machine precision. In plain English, the 
SVD algorithm computes the singular values of a "nearby" matrix A+ AA. 

Note that Ü and V are not necessarily close to their exact counterparts. 
However, we can show that ĉr is close to øg. Using (5.5.10) and Theorem 
2.5.2 we have 


min  |A-Bl]; 
rank(B)=k~1 


9k 


min  |(£-B)-WT(AAZll. 
rank(B)=k~1 


Since || WT(AA)Z la € el Alla =eo1 and 


min — ||f4 ~ Bla = ôk 
rank(B)mk-1 


it follows that |o, — k| < eo, for k = 1:n. Thus, if A has rank r then we 
can expect n — r of the computed singular values to be small. Near rank 
deficiency in A cannot escape detection when the SVD of A is computed. 


Example 5.5.2 For the matrix Tyoo(.2) in Example 5.5.1, on ^s .367- 1078. 

One approach to estimating r = rank(A) from the computed singular 
values is to have a tolerance 6 > 0 and 4 convention that A has “numerical 
‘rank” f if the dj satisfy 

61 2 R Ôr > 6 2 e411 2-- 2G 
The tolerance 6 should be consistent with the machine precision, e.g. 6 = 
ull A loo. However, if the general level of relative error in the data is larger 
than u, then 6 should be correspondingly bigger, e.g., § = 107°] A llo if 
the entries in A are correct to two digits. 

If f is accepted as the numerical rank then we can regard 
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as an approximation to zr, s. Since || ze || = 1/os < 1/6 then 6 may also 
be chosen with the intention of producing an approximate LS solution with 
suitably small norm. In §12.1, we discuss more sophisticated methods for 
doing this. 

If 84 > ô, then we have reason to be comfortable with zp because A 
can then be unambiguously regarded as a rank(As) matrix (modulo 6). 

On the other hand, (01,...,04) might not clearly split into subsets 
of small and large singular values, making the determination of f by this 
means somewhat arbitrary. This leads to more complicated methods for 
estimating rank which we now discuss in the context of the LS problem. 

For example, suppose r — n, and assume for the moment that AA = 0 
in (5.5.10). Thus c; = 6; for i = l:n. Denote the ith columns of the 
matrices U, Ww, V, and Z by ui, wi, %, and z;, respectively. Subtracting 
zp from zrs and taking norms we obtai 


FI (wfb)z; ~ (uP yy, e. (up? 
lza- zis a Sy Ln (oF be le X (2) . 


- o. 
i=l i iml M 


From (5.5.8) and (5.5.9) it is easy to verify that 


Ioaz (abae s20-9dbl ——— (651) 
and therefore 
ô e. (ub 
-~ s< —2(1 + bll + s+]. 
Hz = zts la $ 2(1 + oel bz X( 2n 


The parameter f can be determined as that integer which minimizes the 
upper bound. Notice that the first term in the bound increases with ?, 
while the second decreases. 

On occasions when minimizing the residnal is more important than ac- 
curacy in the solution, we can determine ? on the basis of how close we 
surmise || 5 — Az; ||; is to the true minimum. Paralleling the above analy- 
sis, it can be shown that 


I b~ Aza la ~ l5 — Azzs l2 < (n =>) b I2 + ell lla (r+ 262) . 


Again ? could be chosen to minimize the upper bound. See Varah (1973) 
for practical details and also the LAPACK manual. 


5.5.9 Some Comparisons 


As we mentioned, when solving the LS problem via the SVD, only £ and 
V have to be computed. The following table compares the efficiency of this 
approach with the other algorithms that we have presented. 
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Normal Equations 
Householder Orthogonalization 
Modified Gram Schmidt 


Givens Orthogonalization 
Householder Bidiagonalization 
R-Bidiagonalization 
Golub-Reinsch SVD 

RSVD 


Problems 


P5.5.1 Show that if 
Ax T S T 
B 0 0 m-r 
Tn-T 
where r = rank(A) and T is nonsingular, then 
a T^? 0 T 
x= [ 0 0 ] n-r 
r m-r 


satisfies AXA = A and (AX)? = (AX). In this cane, we say that X is a (1,3) pseudo- 
inverse of A, Show that for general A, zg = Xb where X is a (1,3) pseudo-inverse of A. 


P5.5.2 Define B(A) € ROX by B(A) = (AT A + AI)7! AT, where A > 0. Show 
a 
B(A) - At Ser at eee Te aes = rank(A 
1 BQ) LE JAAA C (A) 
and therefore that B(A) — At as A — 0. 
P5.5.3 Consider the rank deficient LS problem 


ee ils sll- E 

vER” 0 0 z d jii; 

z€R"7* 
where RE R'*^, S € R'*^77, y € RY, and z € R°". Assume that R is upper triangu- 
lar and nonsingular. Show how to obtain tbe minimum norm solution to this problem 
by computing an appropriate QR factorization without pivoting and then solving for the 
appropriate y and z. 
P5.5.4 Show that if Ay — A and Af — At, then there exista an integer ko such that 
rank(A,) is constant for all k > ko. 
P5.5.5 Show that if A c R™** has rank n, then so doen A + E if we have the inequality 
IE At lla <1. 


Notes and References for Sec, 5.5 
The pseudo-inverse literature is vast, as evidenced by the 1,775 references in 
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M.Z. Nasbed (1976). Generalized Inverses ond Applications, Academic Presa, New York. 
The differentiation of the pseudo-inverse is further discussed in 


C.L. Lawson and RJ, Hanson (1969). "Extensions and Applications of the Householder 
for Solving Linear Least Squares Problems,” Math. Comp. 23, 787-812, 
G.H. Golub and V. Pereyra (1973). “The Differentiation of Pseudo-Inverses and Nonlin- 
ear Least Squares Problems Whose Variables Separate,” SIAM J. Num. Anal 10, 
413-32. 


Survey treatments of LS perturbation theory may be found in Lawson and Hanson 
(1974), Stewart and Sun (1991), Björck (1996), and 


P.A. Wedin (1973). “Perturbation Theory for Pseudo-Inverses,” BIT 13, 217-32. 

G.W. Stewart (1977). “On tbe Perturbation of Pseudo-Inverses, Projections, and Linear 
Least Squares,” SIAM Review 19, 634-62. 

Even for full rank problems, column plvoting seems to produce more accurate solutions. 

The error analysis in tbe following paper attempts to explain why. 

L.S. Jennings and M.R. Osborne (1974). “A Direct Error Analysis for Least Squares,” 
Numer, Math. 22, 322-32. 


Various other aspects rank deficiency are discussed in 

J.M. Varah (1973). “On the Numerical Solution of IIl-Conditioned Linear Systems with 
Applications to Ill-Posed Problems," SIAM J. Num. Anal. 10, 257-67. 

G.W. Stewart (1984). “Rank Degeneracy,” SIAM J. Sct. and Stat. Comp. 5, 403-413. 

P.C, Hansen (1987). “The Truncated SVD as a Method for Regularization,” BIT 27, 
534-553. 

G.W. Stewart (1987). “Collinearity and Least Squares Regression," Statistical Science 
2, 68-100. 


We have more to say on the subject in 512.1 and 512.2. 


5.0 Weighting and Iterative Improvement 


The concepts of scaling and iterative improvement were introduced in the 
Chapter 3 context of square linear systems. Generalizations of these ideas 
that are applicable to the least squares problem are now offered. 

5.6.1 Column Weighting 

Suppose G € R'*" is nonsingular. A solution to the LS problem 


min || Az — b ||; AéR™*", bem" (5.6.1) 
can be obtained by finding the minimum 2-norm solution yrs to 
min || (AG)y ~ > |l; (5.6.2) 


and then setting rg = Gyrs. If rank(A) = n, then zg = zrg. Otherwise, 
zg is the minimum G-norm solution to (5.6.1), where the G-norm is defined 
by I| z lg = [16772 ls 
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The choice of G is important. Sometimes itz selection can be based on 
à priori knowledge of the uncertainties in A. On other occasions, it may be 
desirable to normalize the columns of A by setting 


G = Go = diag(1/|| A( 1) lla, . ..,1/] AG,n) Ila). 
Van der Sluis (1969) has shown that with this choice, «2(AG) is approxi- 
mately minimized. Since the computed accuracy of yz s depends on x2(AG), 
a case can be made for setting G = Go. 
We remark that column weighting affects singular values. Consequentiy, 


a scheme for determining numerical rank may not return the same estimates 
when applied to A and AG. See Stewart (1984b). 


5.6.2 Row Weighting 


Let D = diag(d),...,dm) be nonsingular and consider the weighted least 
squares problem 


minimize |} D(Ar-5)|; AER™", beR”. (5.6.3) 


Assume rank(A) = n and that zp solves (5.6.3). It follows that the solution 
zus to (5.6.1) satisfies 


zp-zLs = (ATD' A) AT(D! — I)(b — Azzs). (5.6.4) 


This shows that row weighting in the LS problem affects the solution. (An 
important exception occurs when b € ran(A) for then rp = zz.) 

One way of determining D is to let dy be some measure of the un- 
certainty in by, e.g., the reciprocal of the standard deviation in . The 
tendency is for ry = ef (b — Azp) to be small whenever d, is large. The 
precise effect of dą on ry can be clarified as follows. Define 


D(5) = diag(di,..., dy, dy VI FÒ drys... d.) 


where ô > —1. If z(5) ininimizes || D(6)( Az — b) || and rz(5) is the k-th 
component of b — Az (5), then it can be shown that 

aO = P RATAA DA AT (5.6.5) 
This explicit expression shows that r,(5) is a monotone decreasing function 
of 6. Of course, how ry changes when all the weights are varied is much 
more complicated. 


Example 5.6.1 Suppose 
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MD = l then 2p = (—1, .85]T and y = b — Azp = [.3, -.4, —1, 2]T. On 
the other hand, if D = diag( 1000, 1, 1, 1) then we have zp fs [—1.42, 1.21 ]T and 
rcb- Azp = [.000428 —.571428 — .142853 .285714]T. 


5.6.3 Generalized Least Squares 


In many estimation problems, the vector of observations 5 is related to z 
through the equation 
b= Ar+w (5.6.6) 
where the noise vector w has zero mean and a symmetric positive defi- 
nite vartance-covariance matrix ¢?7W. Assume that W is known and that 
W = BBT for some B € R™*™. The matrix B might be given or it might 
be W's Cholesky triangle. In order that all the equations in (5.6.6) con- 
tribute equally to the determination of z, statisticians frequently solve the 
LS problem 
min|| B7! (Az - b) lla. (5.6.7) 
An obvious computational approach to this problem is to form A=B A 
and 6 = B-!b and then apply any of our previous techniques to minimize 
|| Ax ~ b ||2. Unfortunately, x will be poorly determined by such a proce- 
dure if B is ill-conditioned. 
À much more stable way of solving (5.6.7) using orthogonal transforma- 
tions has been suggested by Paige (19792, i979b). It is based on the idea 
that (5.6.7) is equivalent to the generalized least squares problem, 


min vy. (5.6.8) 
bmAz-Hu 


Notice that this problem is defined even if A and B are rank deficient. 

Although Paige's technique can be applied when this is the case, we shall 

describe it under the assumption that both these matrices have full rank. 
The first step is to compute the QR factorization of A: 


esa-[*] e-(e a] 
n m-n 

An orthogonal matrix Z € R™*™ is then determined so that 
QĪBZ=[|0 S] Z=(4% Z] 


n m-n n m-n 


where S is upper triangular. With the use of these orthogonal matrices the 
constraint in (5.6.8) transforms to 


[e] - [e] TBZ, QTBZ [æ . 
QT 0 0 S Ze 


5.6. WEIGHTING AND ITERATIVE IMPROVEMENT 267 


Notice that the “bottom half” of this equation determines v, 
SuzQib v= Zqu, (5.6.9) 
while the “top half” prescribea z: 
Riz = QTb~ (QTBZZT + QT BZ2Z7)v = QTb~ QT BZau. (5.6.10) 
The attractiveness of this method is that all potential ill-conditioning is 
concentrated in triangular systems (5.6.9) and (5.6.10). Moreover, Paige 


(1979b) has shown that the above procedure is numerically stable, some- 
thing that is not true of any method that explicitly forms B^! A. 


5.6.4 Iterative Improvement 


A technique for refining an approximate LS solution has been analyzed by 
Björck (1967, 1968). It is based on the idea that if 


[2 4 | |: | - | i | AcR"*,b5eR" (5611) 
then || b — Az [2 = min. This follows because r+ Az = b and ATr = 0 imply 
AT Ax = ATb. The above augmented system is nonsingular if rank(A) = 
n, which we hereafter assume. 

By casting the LS problem in the form of a square linear system, the 
iterative improvement scheme (3.5.5) can be applied: 


7) 29; 2 — 0 
for k = 0,1, 


[]- [8] - [4 ai] 

Le S] [9] [e 

[eo] = £8 ]+ [8] 
end 


The residuals f(*) and g) must be computed in higher precision and an 
original copy of A must be around for this purpose. 
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If the QR factorization of A is available, then the solution of the aug- 
mented system is readily obtained. In particular, if A = QR and R; = 
R(Y:n, l:n), then a system of the form 


[æ aliz] [a] 


transforms to 


s= [h] min = [2]. 


Thus, p and z can be determined by solving the triangular systems REA = g 
and Raz = fi — h and setting p= Q | H | Assuming that Q is stored in 
factored form, each iteration requires 8mn — 2n? flops. 

The key to the iteration’s success is that both the LS residual and so- 
lution are updated—not just the solution. Bjórck (1968) shows that if 
K2(A)  8* and t-digit, @-base arithmetic is used, then z&? has appraxi- 
mately k(t — q) correct base Ø digits, provided the residuals are computed 
in double precision. Notice that it is x2(A), not rA, that appears in 
this heuristic. 

Problems 


P5.6.1 Verify (5.6.4). 
P5.6.2 Let A € E'^*" have full rank and define the diagonal matrix 
A = diag( 1,...,1, (16) 1,...,1) 
— —— 
k-1 m-k 


for 6 > —1. Denote the LS solution to min {| A(Az — b) [la by z(&) and its residnal by 
r(6) = b — Az(5). (a) Show 


re = ( -s ALAT A)- AT eye? Jro. 


1+ bef A(ATA)-1AT en 
(b) Letting 7,(6) stand for the kth component of r(ó), show 
- (0) 
MO = VERTAATA)TATo, 
(c) Uas (b) to verify (5.6.5). 


P5.6.3 Show how the SVD can be used to solve the generalized LS problem when the 
matrices A and B in (5.6.8) are rank deficient. 
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P5.6.4 Let A c R™** have rank n and for a > 0 define 


Mia) = [ uF 4]. 


Show that 


om+na(M(a)) = min fa. -2 + Ajes( AP + (Gy) 


and determine tbe value of a that minimizes «(M (a)). 
P5.6.5 Another iterative improvement method for LS problems is tbe following: 


z® =0 

for k =0,1,... 
r(9 =b— Az(0 (double precision) 
I| Az — r0 |], = min 
ze) = xl) 4 i00 

end 


(a) Assuming that the QR factorization of A is available, how many flopa per iteration 
are required? (b) Show that tbe above iteration results by setting g*) = 0 in the itera- 
tive improvement scheme given in $5.6.4. 


Notes and References for Sec. 5.6 
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Math. 14, 14-23. 
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Decompositions,” Math. Comp. 43, 483-490. 


The theoretical and computational aspects of the generalized least squares problem ap- 
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S. Kourouklis and C.C. Paige (1981). “A Constrained Least Squares Approach to the 
General Gaum-Markov Linear Modal," J. Amer. Stat. Assoc. 76, 620-25. 

C.C. Paige (19792). “Computer Solution and Perturbation Analysis of Generalized Least 
Squares Problems,” Moth, Comp. 33, 171-84. 
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G.H. Golub and J.H. Wilkinson (1966). “Note on Iterative Refinement of Least Squares 
Solutions,” Numer. Moth. 9, 13948. 

Å. Bjórck and G.H. Golub (1967). “Iterative Refinement of Linear Least Squares Solu- 
tions by Householder Transformation,” BIT 7, 322-37. 
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5.7 Square and Underdetermined Systems 


The orthogonalization methods developed in this chapter can be applied to 
square systems and also to systems in which there are fewer equations than 
unknowns. In this brief section we discuss some of the various possibilities. 


5.7.1 Using QR and SVD to Solve Square Systems 


The least squares solvers based on the QR factorization and the SVD can 
be used to solve square linear systems: just set m = n. However, from 
the flop point of view, Gaussian elimination is the cheapest way to solve 
a square linear system as shown in the following table which assumes that 
the right hand side is available at the time of factorization: 


Gaussian Elimination 
Householder Orthogonalization 


Modified Gram-Schmidt 
Bidiagonalization 
Singular Value Decomposition 


Nevertheless, there are three reasons why orthogonalization methods might 
be considered: 


* The flop counts tend to exaggerate the Gaussian elimination advan- 
tage. When memory traffic and vectorization overheads are consid- 
ered, the QR approach is comperable in efficiency. 


* The orthogonalization methods heve guaranteed stability; there is no 
“growth factor” to worry about as in Gaussian elimination. 


* In cases of ill-conditioning, the orthogonal methods give an added 
measure of reliability. QR with condition estimation is very depend- 
able and, of course, SVD is unsurpassed when it comes to producing 
a meaningful solution to a nearly singular system. 


We are not expressing a strong preference for orthogonalization methods 
bnt merely suggesting viable alternatives to Gaussian elimination. 

We also mention that the SVD entry in Table 5.7.1 assumes the avail- 
ability of b at the time of decomposition. Otherwise, 20n? flops are required 
because it then becomes necessary to accumulate the U matrix. 

If the QR factorization is used to solve Ar = b, then we ordinarily 
heve to carry out a back substitution: Rz = QT). However, this can be 
avoided by “preprocessing” b. Suppose H is a Householder matrix such 
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that Hb = fe, where e, is the last column of J,. If we compute the QR 
factorization of (H A)T, then A = HT RTQT and the system transforms to 


Rly = Ben 


where y = QT z. Since RT is lower triangular, y = (6/Tan)én and so 


2 8 ot. 
z= LOG. 
5.7.2  Underdetermined Systems 
We say that a linear system 
Arzb  A€R"", bem" (5.7.1) 


is underdetermined whenever m « n. Notice that such a system either has 
no solution or has an infinity of solutions. In the second case, it is important 
to distinguish between algorithms that find the minimum 2-norm solution 
and those that do not necessarily do so. The first algorithm we present is 
in the latter category. Assume thet A has full row rank and thet we apply 
QR with column pivoting to obtain: 


Q7 AII = [Ri R2] 


where R; € R™*™ is upper triangular and Rz c IRP?*(^77?), Thus, Az =b 
transforms to 


(QT AID(ITz) = [Ri R] | a | = QT 


where 
e [1] 
z 
with zı € R” and z € R®-™), By virtue of the column pivoting, Ry is 
nonsingular because we are assuming that A has full row rank. One solution 
to the problem is therefore obtained by setting zı = RI 'QTb and z; = 0. 


Algorithm 5.7.1 Given A € IR™**" with rank(A) = m and b c R™, the 
following algorithm finds an z € R” such that Ar = b. 

QTA =R (QR with column pivoting.) 

Solve R(1:m, l:m)z, = QTb. 


se ==n| 3 
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This algorithm requires 2m?n — m?/3 flops. The minimum norm solution 
is not guaranteed. (A different II would render a smaller z1.) However, if 
we compute the QR factorization 


renal 
with R, € R™*™, then Az = b becomes 


(QRT: = | RT oija] =b 


QTr= | a | zn eR”, z2¢€R™™. 


Now the minimum norm solutiou does follow by setting z = 0. 


Algorithm 5.7.2 Given A € R™”" with rank(A) = m and b e R”, the 
following algorithm finds the minimal 2-norm solution to Az = 5. 


AT -QR (QR factorization) 
Solve R(1: n, l: n) z = b. 
z= Q(:,lim)z 


This algorithm requires at most 2m?n — 2m3/3 
The SVD can also be used to compute the minimal norm solution of an 
underdetermined Az = b problem. If 


A= X cuv]  r=rank(A) 


i=l 


is A’s singular value expansion, then 


As in the least squares problem, the SVD approach is desirable whenever 
Ais nearly rank deficient. 


5.7.3 Perturbed Underdetermined Systems 


We conclude this section with a perturbation result for full-rank underde- 
termined systems. 


5.7. SQUARE AND UNDERDETERMINED SYSTEMS 273 


Theorem 5.7.1 Suppose rank(A) = m < n and that A € R"*^ , 6A c IR"*^, 
0456 R”, and 5b c R™ satisfy 


€ = max{e,, ts} < ¢m(A), 


where €4 = || 6A [[2/[] A [la and e& = || 5d [[2/|| b l2. If z and ê are minimum 
norm solutions that satisfy 


Ar=b (A+ bA)z = b+ 6b 
then 


rae € m(A)(camin(2,n —m + 1} + e) + O(€)). 


Proof. Let E and f be defined by 6A/e and b/e. Note that rank(A + tE) = 
m for all 0 < £ < e and that 


a(t) = (A-tE)T ((A- tEY(A- tE)T) ! (64 tf) 
satisfies (A + tE)z(t) = b+ tf. By differentiating this expression with 
respect to ¢ and setting t = 0 in the result we obtain 
#(0) = (I— AT(AAT)- 14) ET(AAT)-!b + AT(AAT)-t(f — Ez). 
Since 
Wz lz = | AT(AAT)"!5ll 2 om(A)l] (AAT)7!5 lis, 
l2 —- AT(AAT)-!All = min(1,n — m), 


and 
ifi» < Ifill Al 
Tiz * Hoye’ 
we heve 
l-z . z(9-z(00 _ 120) lz 2 
Iziz (0) lz Izh * ole) 


IEJ Hfi , Ell 

+ + 
[Ala dW5la TA 
from which the theorem follows. C 


< e min(1,n—m) { i ral) + Oe) 


Note that there is no &2( 4)" factor as in the case of overdetermined systems. 


Problems 


P5.7.1 Derive the above expression for (0). 

P5.7.3 Find the minimal norm solution to tbe system Az = b where A = [123] and 
b=1. 

P5.7.3 Show how triangular sytem solving can be avoided when using the QR factor- 
ization to solve an underdetermined system. 

PS.7.4 Suppose b, z € RO ere given. Consider the foliowing problems: 
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(s) Find an unsymmetric Toeplits matrix T so Tz = b. 

(b) Find a symmetric Toeplits matrix T so Tz = 6. 

(c) Find & circulant matrix C so Cz = b. 
Pose such problem in tbe form Ap = b where A is a matrix made up of entries from z 
and p is the vector of sought-after parameters. 


Notes and References for Sec. 5.7 
Interesting aspects concerning singular systems are discussed io 


T.F. Chan (1964). “Deflated Decomposition Solutions of Nearly Singular Systems,” 
SIAM J. Num. Anal. 21, 738-T54. 

G.H. Golub and C.D. Meyer (1988). “Using the QR Factorization and Group Inversion 
to Compute, Differentiate, and estimate tbe Sensitivity of Stationary Probabilitias 
for Merkov Chains,” SIAM J. Alg. and Dis. Methods, 7, 273-281. 


Papers on underdetermined systems include 


R.E. Cline and R.J. Plemmoas (1976). "Lz-Soiutions to Underdetermined Linear Sys- 
tems,” SIAM Review 18, 92-106. 

M. Arioli and A. Laratta (1965). "Error Analysis of an Algorithm for Solving an Under- 
determined System," Numer. Math. 46, 255-268. 

J.W. Demmel and N.J. Higham (1993). “Improved Error Bounds for Underdetermined 
System Solvers,” SIAM J. Matriz Anal. Appl. 14, 1-14. 


The QR factorization can of course be used to soive linear systems. See 


N.J. Higham (1991). “Iterative Refinement Enhances the Stability of QR Factorization 
Methods for Solving Linear Equations,” BIT 31, 447-468. 


Chapter 6 


Parallel Matrix 
Computations 


$6.1 Basic Concepts 
$6.2 Matrix Multiplication 
$6.3 Factorizations 


The parallel matrix computation area has been the focus of intense 
research. Although much of the work is machine/system dependent, a 
number of basic strategies heve emerged. Our aim is to present these along 
with a picture of what it is like to “think parallel” during the design of a 
matrix computation. 

The distributed and shared memory paradigms are considered. We use 
matrix-vector multiplication to introduce the notion of a node program in 
96.1. Load balancing, speed-up, and synchronization are also discussed. 
In 86.2 matrix-matrix multiplication is used to show the effect of blocking 
on granularity and to convey the spirit of two-dimensional data flow. Two 
parallel implementations of the Cholesky factorization are given in 86.3. 


Before You Begin 


Chapter 1, $4.1, and $4.2 are assumed. Within this chapter there are 
the following dependencies: 


$6.1 — 862 — 863 


Complementary references include the books by Schónauer (1987), Hock- 
ney and Jesshope (1988), Modi (1988), Ortega (1988), Dongarra, Duff, 


ane 
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Sorensen, and van der Vorst (1991), and Golub and Ortega (1993) and the 
excellent review papers by Heller (1978), Ortega and Voight (1985), Galli- 
van, Plemmons, and Sameh (1990), and Demmel, Heath, and van der Vorst 
(1993). 


6.1 Basic Concepts 


In this section we introduce the distributed and shared memory peradigms 
using the gaxpy operation 


z=y+Az, AER 2, y,z € IR" (6.1.1) 


as an example. In practice, there is a fuzzy line between these two styles 
of parallel computing and typically a blend of our comments apply to any 
particular machine. 


6.1.4 Distributed Memory Systems 


In a distributed memory multiprocessor each processor has a local mem- 
ory and executes its own node program. The program can alter values in 
the executing processor’s local memory and can send data in the form of 
messages to the other processors in the network, The interconnection of 
the processors defines the network topology and one simple example that 
is good enough for our introduction is the ring. See FIGURE 6.1.1. Other 


FIGURE 6.1.1 A Four-Processor Ring 


important interconnection schemes include the mesh and torus (for their 
close correspondence with two-dimensional arrays), the hypercube (for its 
generality and optimality), and the tree (for its handling of divide and 
conquer procedures). See Ortega and Voigt (1985) for a discussion of the 
possibilities. Our immediate goal is to develop a ring algorithm for (6.1.1). 
Matrix multiplication on a torus is discussed in $6.2. 

Each processor has an identification number. The uth processor is des- 
ignated by Proc(u). We say that Proc(A) is a neighbor of Proc(s) if there 
is a direct physical connection between them. Thus, in a p-processor ring, 
Proc(p — 1) and Proc(1) are neighbors of Proc(p). 
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Important factors in the design of an effective distributed memory al- 
gorithm include (a) the number of processors and the capacity of the local 
memories, (b) how the processors are interconnected, (c) the speed of com- 
putation relative to the speed of interprocessor communication, and (d) 
whether or not a node is able to compute and communicate at the same 
time. 


6.1.2 Communication 


To describe the sending and receiving of messages we adopt a simple nota- 
tion: 


send( (matriz) , (id of the receiving processor) ) 
recv( (matriz) , (id of the sending processor) ) 


Scalars and vectors are matrices and therefore messages. In our model, 
if Proc(u) executes the instruction send(Vis., A), then a copy of the local 
matrix Vis, is sent to Proc(A) and the execution of Proc(1)'s node program 
resumes immediately. It is legal for a processor to send a message to itseif. 
To emphasize that a matrix is stored in a local memory we use the subscript 
“toe.” 

If Proc(u) executes the instruction recv(Ujo., A), then the execution of 
its node program is suspended until a message is received from Proc(A). 
Once received, the message is placed in a local matrix Urs; and Proc() 
resumes execution of its node program. 

Although the syntax and semantics of our send/receive notation is ad- 
equate for our purposes, it does suppress a number of important details: 


* Message assembly overhead. In practice, there may be a penalty 
associated with the transmission of a matrix whose entries are not 
contiguous in the sender's local memory. We ignore this detail. 


e Message tagging. Messages need not arrive in the order they are sent, 
and a system of message tagging is necessary so that the receiver is 
not "confused." We ignore this detail by assuming that messages do 
arrive in the order that they are sent. 


e Message interpretation overhead. In practice a message is a bit string, 
and a header must be provided that indicates to the receiver the 
dimensions of the matrix and the format of the floating point words 
that are used to represent its entries. Going from message to stored 
matrix takes time, but it is an overhead that we do not try to quantify. 


These simplifications enable us to focus on high-level algorithmic ideas. But 
it should be remembered that the success of a particular implementation 
may hinge upon the control of these hidden overheads. 
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6.1.3 Some Distributed Data Structures 


Before we can specify our first distributed memory algorithm, we must 
consider the matter of data layout. How are the participating matrices and 
vectors distributed around the network? 

Suppose z € R” is to be distributed among the local memories of a p- 
processor network. Ássume for the moment that n — rp. Two "canonical" 
approaches to this problem are store-by-row and store-by-column. 

In store-by-column we regard the vector z as an r-by-p matrix, 


Zrxp = | z(hr) z(r*L2r) +- z(04(p-l1)rm)], 


and store each column in a processor, Le, z(1 + (p — l)rpr) € Proc(u). 
(In this context "^c" means “is stored in.”) Note that each processor houses 
& contiguous portion of z. 

In the store-by-row scheme we regard z as a p-by-r matrix 


zpxe = [ (Lp) z(p4l2p) = z((r—1)p&Lm)], 


and store each row in a processor, i.e., z(u:;p:n) € Proc(u). Store-by-row is 
sometimes referred to as the wrap method of distributing a vector because 
the components of z can be thought of as cards in a deck that are "dealt" 
to the processors in wrap-around fashion. 

If n is not an exact multiple of p, then these ideas go through with minor 
modification. Consider store-by-column with n = 14 and p = 4: 


T 
z^ = [ri 222324 | T5 Ze zz za | zo zio zu, | zia zia zi4]- 
Froc(1) Proc(2) Proc(3) Proc(4) 


In general, if n = pr + q with 0 < q < p, then Proc(1),...,Proc(g) can 
each house r + 1 components and Proc(g + 1),..., Proc(p) can house r 
components. In store-by-row we simply let Proc(1) house z(u:p:n). 

Similar options apply to the layout of a matrix. There are four obvious 
possibilities if A € K'*" and (for simplicity) n = rp: 


[How | Contiguous | AQ + (a — Dra) | 
[Row [Wee [Apn] . 


These strategies have block analogs. For example, if A — [A1,..., Aw] is 
a block column partitioning, then we could arrange to have Proc(1) store 
A, for i = u:p:N. 
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6.1.4 Gaxpy on a Ring 
We are now set to develop a ring algorithm for the gaxpy z = y + Ar 


(A € F?*^, z, y € RR"). For clarity, assume that n = rp where p is the size 
of the ring. Partition the gaxpy as 


E un Au c Alp zi 
i= Pj : i D. (6.1.2) 
2p Yp Api f App Tp 


where Ay, € R'"* and Zi, yi, z; € R7. We assume that at the start of com- 
putation Proc(u) houses zy, yp, and the uth block row of A. Upon com- 
pletion we set as our goal the overwriting of y, by z,. From the Proc(y) 
perspective, the computation of 


P 
Za = Va + 3 Aer. 


rs] 


involves local data (Ayr, yp, r4) and nonlocal data (z,, 7 # 4j). To make 
the noulocal portions of z available, we circulate its subvectors around the 
ring. For example, in the p = 3 case we rotate the 1, 72, and 3 as follows: 


[i1] = |n | = | 


When a subvector of z “visits”, the host processor must incorporate the 
appropriate term into its running sum: 


[step | Proc) | — Proc(2) — | — Proc(3) — | 
[tn = tn + Azs 
| 2 [m-w + Ayza | yo = ya + Ants | ys = ys + Aaz | 
L3 [n= + Anti | vo = vo + Anata | ys = ys + Asses | 


In general, the “merry-go-round” of z subvectors makes p “stops.” For each 
received z-subvector, a processor performs an r-by-r gaxpy. 


Algorithm 6.1.1 Supposs A € E?*^, z € IR^, and y € R” are given and 
that z = y + Ar. If each processor in a p-processor ring executes the 
following node program and n = rp, then upon completion Proc(u) houses 
z(14-(u— l)r:ur) in yc. Assume the following local memory initializationa: 
p, ws (the node id), left and right (the neighbor id’s), n, row = 1--(u-l)r:ur, 
Ajoc = A(row,:), zio = z(row), Yloc = y(row). 
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for t = l:p 

send (Ztoc, right) 

recv(zic, left) 

T-pu-í 

ifr<0 

T2Ttp 

end 

{ ztoc = z(1 + (r - Ir:zr) } 

Vloc = Yoc + Atoe(:11 + (7 — D)riTr)zioc 
end 


The index r names the currentiy available z subvector. Once it is com- 
puted it is possible to carry out the update of the locally housed portion of 
y. The send-recv pair passes the currently housed z subvector to the right 
and waits to receive the next one from the left. Synchronization is achieved 
because the local y update cannot begin until the "new" z subvector ar- 
rives. It is impossible for one processor to "race ahead" of the others or for 
an z subvector to pass another in the merry-go-round, The algorithm is 
tailored to the ring topology in that only nearest neighbor communication 
is involved. The computation is also perfectiy load balanced meaning that 
each processor has the same amount of computation and communication. 
Load imbalance is discussed further in 86.1.7. 

The design of a parallel program involves subtleties that do not arise in 
the uniprocessor setting. For example, if we inadvertently reverse the order 
of the send and the recv, then each processor starts its node program by 
waiting for a message from its left neighbor. Since that neighbor in turn is 
waiting for a message from its left neighbor, a state of deadlock results. 


6.1.5 The Cost of Communication 


Comununication overheads can he estimated if we model the cast of sending 
and receiving a message. To that end we assume that a send or recv 
involving m floating point numbers requires 


T(m) = aa + Bam (6.1.3) 


seconds to carry out. Here ag is the time required to initiate the send or 
recv and Ju is the reciprocal of the rate that a message can be transferred. 
Note that this model does not take into consideration the "distance" be- 
tween the sender and receiver. Cleariy, it takes longer to pass a message 
halfway around a ring than to a neighbor. That is why it is always desirable 
to arrange (if possible) a distributed computation so that communication 
is just between neighbors. 

During each step in Algorithm 6.1.1 an r-vector is sent and received and 
2r? flops are performed. If the computation proceeds at R flops per second 
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and there is no idle waiting associated with the recv, then each yoe update 
requires approximately (2r?/R) + 2(aq + far) seconds. 

Another instructive statistic is the computation-to-communication ratio, 
For Algorithm 6.1.1 this is prescribed by 


Time spent computing = 2r?/R 
Time spent communicating — 2(ag + Gur)” 
"This fraction quantifies the overhead of communication relative to the vol- 


ume of computation. Clearly, as r — n/p grows, the fraction of time spent 
computing increases,’ 


6.1.6 Efficiency and Speed-Up 
The efficiency of a p-processor parallel algorithm is given by 
.10) 
pT(p) 
where T'(k) is the time required to execute the program on k processors. 


If computation proceeds at R flops/sec and communication is modeled by 
(6.1.3), then a reasonable estimate of T'(k) for Algorithm 6.1.1 is given by 


T(k) = Yan [R+ 2(aa + Ba(n/k)) = —— on 2aak + 2an 


for k > 1. This assumes no idle waiting. If k = 1, then no communication 
is required and T(1) = 2n?/ R. It follows that the efficiency 


1 
= — Ri 3 
De v (aah +8) 
improves with increasing n and degradates with increasing p or R. In 
practice, benchmarking is the only dependable way to assess efficiency. 
A concept related to efficiency is speed-up. We say that a perallel algo- 
rithm for a particular problem achieves speed-up S if 


5 = Tyeq/Tpor 


where Ty, is the time required for execution of the parallel program and 
Tyeq is the time required by one processor when the best uniprocessor pro- 
cedure is used. For some problems, the fastest sequential algorithm does 
not parallelize and so two distinct algorithms are involved in the speed-up 
assessment. 


1 We mention that these simple measures ere not particularly illuminating in systems 
where the nodes are able to overlap computation and communication. 
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6.1.7 The Challenge of Load Balancing 


If we apply Algorithm 6.1.1 to a matrix A € E?*" that is lower triangular, 
then approximately half of the flops associated with the Ytoc updates are 
unnecessary because half of the A;; in (6.1.2) are zero. In particular, in the 
uth processor, Ajoc(:,1 + (T — 1)r:7r) is zero if r > u. Thus, if we guard 
the foe update as follows, 


ifr<p 
Vloc = Woe + Atoc(:,1 + (T — 1)r:Tr)z10c 
end 


then the overall number of flops is halved. This solves the superfluous flops 
problem but it creates a load imbalance problem. Proc() oversees about 
ur^/2 flops, an increasing function of the processor id p. Consider the 
following r = p = 3 example: 


zı a 0 0f0 0 0/0 0 O zi n 
22 aa Ojo 0 0)0 00 Z3 ys 
E aaa/(Q 0 0|00 0 r3 ys 
z4 p B 8|G 0 0/0 0 0 za va 
z%|=|8 6 8|8 B 010 0 0 zs | + | vs 
EN B 8 B|G 6 BG|O 0 0 Ez Ws 
z7 171 117 7 0 0 zr yr 
ze 17 71/7 * W177 0 Te ys 
ke) ae ae ae ae 1]? T | Z9 yo 


Here, Proc(1) handles the a part, Proc(2) handies the S part, and Proc(3) 
handles the y part. 

However, if processors 1, 2, and 3 compute (zi, z4, 27), (22, 25, 2e), and 
(Z3, za, 29), respectively, then approximate load balancing results: 


zy a 0/0 0 0:0 00 Zi 9n 
z4 8 8j|8 0 07/0 0 0 E va 
Z Y |» * ry 9 07] | zs Ed 
z2 a ojo 0 0j0 0 0 Ta ys 
2 l5 sla olano al m 
EN x xi y Y] * 0| 1 zs EH 
Za a ajo 0 0j0 00 zT ys 
z 8 B B18 8 pjoo O| | zs ys 
za 1 Tl* 7 7i "T 7179 Vo 


The amount of arithmetic still increases with 4, but the effect is not no- 
ticeable if n >> p. 

The development of the general algorithm requires some index manip- 
ulation. Assume that Proc(j) is initialized with Ape = A(u:xn,:) and 
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Yor = y(usp:n), and assume that the contiguous z-subvectors circulate as 
before. If at some stage zi, contains z(1 + (r — 1)r:rr), then the update 
Yoc = Yoc + Atoc(*,1 + (T — T)rirr)zioe 
implements 
y(uspsn) = (ipn) + A(uipin, 1+ (r — 1)r:rr)a(1 + (r — 1)r:rr). 


To exploit the triangular structure of A in the yj; computation, we express 
the gaxpy as a double loop: 


for a = lr 
for B = kr 
Wioc(a) = Yioc(A) + Atoc(a. 8 + (T — 1)r)Ztee(8) 
end 
end 


The Ais, reference refers to A(j + (a — 1)p, 6+ (r — 1)r) which is zero unless 
the column index is less than or equal to the row index. Abbreviating the 
inner loop range with this in mind we obtain 


Algorithm 6.1.2 Suppose A € R"*", z € IR" and y € IR" are given and 
that z = y + Az. Assume that n = rp and that A is lower triangular. If 
each processor in a p-processor ring executes the following node program, 
then upon completion Proc(j) houses z(j:p:n) in yi". Assume the following 
local memory initializations: p, z (the node id), left and right (the neighbor 
id's), n, Aloe = A(uip:n,:), Yor = y(u:pin), and zi; = z(14 (u— 1)ripr). 
r=n/p 
for t = lip 
send(zis., right) 
recv(zioc, le ft) 


{Ztoc = z(1-- (T - 1)r:rr)) 
for a= 1r 
for p= li (a—1)p—(r—1)r 
Mloc(@) = ytoe(a) + Atoc(a, B + (T — 1)7)Zioc(p) 
end 
end 
end 


Having to map indices back and forth between “node space” and “global 
space” is one aspect of distributed matrix computations that requires care 
and (hopefully) compiler assistance, 
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6.1.8 Tradeoffs 


As we did in $1.1, let us develop a column-oriented gaxpy and anticipate 
its performance. With the block column partitioning 


A = [Ay,...,Ap] AER, r= n/p 
the gaxpy z= y+ Ar becomes 


? 
=y+ D Auzu 
uml 


where z, = z(1-- (u — 1)r:ur). Assume that Proc() contains A, and z,. 
Its contribution to the gaxpy is the product A,z, and involves local data. 
However, these products must be summed. We assign this task to Proc(1) 
which we assume contains y. The strategy is thus for each processor to 
compute A,z, and to send the result to Proc(1). 


Algorithm 6.1.3 Suppose A € K?*^, z € IR" and y € IR" are given and 
that z = y+ Az. If each processor in a p-processor network executes the 
following node program and n = rp, then upon completion Proc(1) houses 
z. Assume the following local memory initializations: p, ji (the node id), 
N, Zigc = T(1 + (u — l)riur), Aic = A(t, 1+ (u — 1)r:ar), and (in Proc(1) 
only) ytoc = y 
ifu-l 
Yoe = Woe + ÁtocTioc 
for t = 2:p 
recy (Wie, t) 
Yoc = Yloc + Woe 
end 
else 
Wioe = AteeTtoc 
send (wie, 1) 
end 


At first glance this seems to be much less attractive than the row-oriented 
Algorithm 6.1.1. The additional responsibilities of Proc(1) mean that it 
has more arithmetic to perform by a factor of about 


ant/p+ np, PP 
2n?/p 2n 


and more messages to process by a factor of about p. This imbalance be- 
comes less critical if n >> p and the communication parameters œg and fa 
factors are small enough. Another possible mitigating factor is that Algo- 
rithm 6.1.3 manipulates length n vectors whereas Algorithm 6.1.1 works 
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with length n/p vectors. If the nodes are capable of vector arithmetic, then 
the longer vectors may raise the level of performance. 

This brief comparison of Algorithms 6.1.1 and 6.1.3 reminds us once 
again that different implementations of the same computation can have 
very different performance characteristics. 


6.1.9 Shared Memory Systems 


We now discuss the gaxpy problem for a shared memory multiprocessor. In 
this environment each processor has access to a common, global memory 
as depicted in Figure 6.1.2. Communication between processors is achieved 


FIGURE 6.1.2 A Four-Processor Shared Memory System 


by reading and writing to global variables that reside in the global memory. 
Each processor executes its own local program and has its own local memory. 
Data flows to and from the global memory during execution. 


All the concerns that attend distributed memory computation are with 
us in modified form. The overall procedure should be load balanced and the 
computations should be arranged so that the individual processors have 
to wait as little as possible for something useful to compute. The traffic 
between the global and local memories must be managed carefully, because 
the extent of such data transfers is typically a significant overhead. (It 
corresponds to interprocessor communication in the distributed memory 
setting and to data motion up and down a memory hierarchy as discussed 
in §1.4.5.) The nature of the physical connection between the processors 
and the shared memory is very important and can effect algorithmic devel- 
opment. However, for simplicity we regard this aspect of the system as a 
black box as shown in Figure 6.1.2. 
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6.1.10 A Shared Memory Gaxpy 
Consider the following partitioning of the n-by-n gaxpy problem z = y+ Az: 
z n Ay 
pedir |+ : oc. (6.1.4) 
žp Up Ay 
Here we assume that n = rp and that A, E€ R'*^, y, € R”, and zp € RF. 
We use the following algorithm to introduce the basic ideas and notations. 


Algorithm 6.1.4 Suppose A € E"*^, ze E^, and y € R” reside in a 
global memory accessible to p processors. If n = rp and each processor 
executes the following algorithm, then upon completion, y is overwritten 
by z = y + Az. Assume the following initializations in each local memory: 
p, p (the node id), and n. 


r=n/p 
row = 1+ (u-—l)rur 
Tio =T 
Jioc = y(row) 
for j = lin 
Bioc = A(row, j) 
Vioc = Vioc + GiocZioc(j) 
end 
(row) = oc 


We assume that a copy of this program resides in each processor. Float- 
ing point variables that are local to an individual processor have a "loc" 
subscript. 

Data is transferred to and from the global memory during the execution 
of Algorithm 6.1.4. Tbere are two global memory reads before the loop 
(zio = = and yioc = y(row)), one read each time through the loop (ais. = 
A(row, j)), and one write after the loop (y(row) = ytoc)- 

Only one processor writes to a given global memory location in y, and 
so there is no need to synchronize the participating processors. Each has 
a completely independent part of the overall gaxpy operation and does not 
have to monitor the progress of the otber processors. The computation is 
statically scheduled because the partitioning of work is determined before 
execution. 

If A is lower triangular, then steps have to be taken to preserve the 
load balancing in Algorithm 6.1.4. As we discovered in 86.1.7, the wrap 
mapping is a vehicle for doing this. Assigning Proc(u) the computation of 
z(u:pin) = g(u:p:n) + A(uzpzn, :)z effectively partitions the n? flops among 
the p processors. 


6.1. Basic CONCEPTS 287 


6.1.11 Memory Traffic Overhead 


It is important to recognize that overall performance depends strongly on 
the overheads associated with the reads and writes to the global memory. 
If such a data transfer involves m floating point numbers, then we model 
the trausfer time by 
T(m) = a, + fm. (6.1.5) 

The parameter a, represents a start-up overhead and @, is the reciprocal 
transfer rate. We modeiled interprocessor communication in the distributed 
environment exactly the same way. (See (6.1.3).) 

Accounting for all the shared memory reads and writes m Algorithm 
6.1.4 we see that each processor spends time 


n? 
T = (n+3)as + ru 


communicating with global memory. 

We organized the computation so that one column of A(row.:) is read 
at a time from shared memory. Jf the local memory is large enough, then 
the loop in Algorithm 6.1.4 can be replaced with 

Ajoc = A(row, :) 
Vioc = loc + AtocTtoe 
This changes the communication overhead to 


- n? 
T = 3a, + — ps, 
p 
a significant improvement if the start-up parameter a, is large. 


6.1.12 Barrier Synchronization 


Let us consider the shared memory version of Algorithm 6.1.4 in which 
the gaxpy is column oriented. Assume n = rp and col = 1+ (u — l)r:ur. 
A reasonable idea is to use a global array W(1:n, 1:p) to house the prod- 
ucts A(:, col) z(col) produced by each processor, and then have some chosen 
processor (say Proc(1)) add its columns: 


Aoc = A(:, col); ztoc = z(col); Wioc = Atoczioc; W(:, p) = utioc 


ifu-l 
Vioc = Y 
for j = l:p 
Woe = W(.j) 
Jioc = Jioc + Uoc 
end 
y = Roc 


end 
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However, this strategy is seriously flawed because there is no guarantee that 
W(l:n, 1:p) is fully initialized when Proc(1) begins the summation process. 

What we need is a synchronization construct that can delay the Proc(1) 
summation until all the processors have computed and stored their contri- 
butions in the W array. For this purpose many shared memory systems 
support some version of the barrier construct which we introduce in the 


following algorithm: 


Algorithm 6.1.5 Suppose A € E"*", z e R”, and y € R” reside in a 
global memory accessible to p processors. If n = rp and each processor 
executes the following algorithm, then upon completion y is overwritten by 
y + Az. Assume the following initializations in each local memory: p, p 
(the node id), and n. 


r = n/p; col = 1 + (p — Dri, Atoc = A(:, col); Zioc = z(col) 
Uloc = AiocTloc 
W(:, H) = utoc 
barrier 
ifu-1 
Vioc =Y 
for j = l:p 
Woe = WC, 3) 
Yoc = Yloc + Wioc 
end 
= pioc 
end 


To understand the barrier, it is convenient to regard a processor as either 
blocked or free. A processor is blocked and suspends execution when it 
executes the barrier. After the pth processor is blocked, all the processors 
return to the “free state” and resume execution. Think of the barrier as 
treacherous stream to be traversed by all p processors. For safety, they 
all congregate on the bank before attempting to cross. When the last 
member of the party arrives, they ford the stream in unison and resume 
their individnal treks. 

In Algorithm 6.1.5, the processors are blocked after computing their 
portion of the matrix-vector product. We cannot predict the order in which 
these blockings occur, but once the last processor reaches the barrier, they 
are all released and Proc(1) can carry out the vector summation. 


6.1.13 Dynamic Scheduling 


Instead of having one processor in charge of the vector summation, it is 
tempting to have each processor add its contribution directly to the global 
variable y. For Proc(), this means executing the following: 
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r=n/p, col 2 1- (n — D)rir; Aic = A(: col); zioc = z(col) 
Whee = AjocTloc 
Yoc = Y; Yoc = Vloc + Wlocs Y = Yoc 


However, a problem concerns the read-update-write triplet 
Yoc = Vi Yoc = Yloo + Uloc; Y = Voc 


Indeed, if more than one processor is executing this code fragment at the 
same time, then there may be a loss of information. Consider the following 
sequence: 


Proc(1) reads y 
Proc(2) reads y 
Proc(1) writes y 
Proc(2) writes y 


The contribution of Proc(1) is lost because Proc(1) and Proc(2) obtain the 
same version of y. As a result, the effect of the Proc(1) write is erased by 
the Proc(2) write. 

To prevent this kind of thing from happening most shared memory 
systems support the idea of a critical section. These are special, isolated 
portions of a node program that require a “key” to enter. Throughout the 
system, there is only one key and so the net effect is that only one processor 
can be executing in a critical section at any given time. 


Algorithm 6.1.6 Suppose A € E?*", z € IR", and y € R” reside in a 
global memory accessible to p processors. If n = pr and each processor 
executes the following algorithm, then upon completion, y is overwritten 
by y+ Ax. Assume the following initializations in each local memory: p, p 
(the node id), and-n. 


r—n/p; col 2 1 (y — D)rigri Ajoc = A(:, col); zio; = z(col) 
Woe = AjoeZtoc 
begin critica] section 
Moc = V 
Vioc = Yloc + Wloe 
y = Moc 
end critical section 


This use of the ¢ritical section concept controls the update of y in a way 
that ensures correctness. The algorithm is dynamically scheduled because 
the order in which the summations occur is determined as the computation 
unfolds. Dynamic scheduling is very important in problems with irregular 
structure. 


290 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS 


Problems 


P6.1.1 Modify Algorithm 6.1.1 so that it can handle arbitrary n. 

P$.1.2 Modify Algorithm 6.1.2 so that it efficiently handles the upper triangular case. 
P6.1.3 (a) Modify Algorithms 6.1.3 and 6.1.4 so that they overwrite y with z = y-- A"z 
for a given positive integer m that is available to each processor. (b) Modify Algorithms 
6.1.3 and 6.1.4 so that y is overwritten by z = y + AT Az. 

P6.1.4 Modify Algorithm 6.1.3 so that upon completion, the local array Ajo, in Proc(j) 
houses the pth block column of A+ zyT. 


P6.1.5 Modify Algorithm 6.1.4 so that (a) A is averwritten by tbe outer product update 
A -- zyT , (b) z is overwritten with Az, (c) y is overwritten by a unit 2-norm vector in 
the direction of y+A*z, and (d) it efficiently handles the case when A is lower triangular. 
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G. C. Fax, M. A. Johnson, G. A. Lyzengs, S. W. Otto, J. K. Salmon and D. W, Walker 
(1988). Solving Problems on Concurrent Processors, Volume 1. Prentice Hall, En- 
glewood Cliffs, NJ. 

D. P. Bertsekas and J, N. Tsitsiklis (1989). Parallel and Distributed Computation: 
Numerical Methods, Prentice Hall, Englewood Cliffs, NJ. 

S. Lakshmivarahan and S. K. Dhall (1990). Analysis and Design of Parallel Algorithms: 
Arithmetic and Matriz Problems, McGraw-Hill, New York. 

T. L. Freeman and C, Phillips (1992). Parallel Numerical Algorithms, Prentice Hall, 
New York. 

F.T. Leighton (1992). Introduction to Parallel Algorithms and Architectures, Morgan 
Kaufmann, San Mateo, CA. 

G. C. Fox, R. D. Williams, and P. C. Messine (1994). Parallel Computing Workef, 
Morgan Kaufmann, San Francisco. 

V. Kumar, A. Grama, A. Gupta and G. Karypis (1994). Introduction to Parailel Com- 
puting: Design and Analysis of Algorithms, Benjamin/Cummings, Reading, MA. 
E.F. Van de Velde (1994). Concurrent Scientific Computing, Springer-Verlag, New York. 
M. Cosnard and D. Trystram (1995). Parallel Algorithms ond Architectures, Interna- 

tional Thomson Computer Prem, New York. 


Here are some general references that are more specific to parallel matrix computations: 


V. Fadeeva and D. Fedeev (1977). “Parallel Computations in Linear Algebra," Kiber- 
netica 6, 28-40, 

D. Heller (1978). “A Survey of Parallel Algorithms in Numerical Linear Algebra," SIAM 
Review 20, 740-777. 
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6.2 Matrix Multiplication 
In this section we develop two parallel algorithms for matrix-matrix multi- 
plication. À shared memory implementation is used to illustrate the effect 


of blocking on granularity and load balancing. À torus implementation is 
designed to convey the spirit of two-dimensional data flow. 


6.2.1 A Block Gaxpy Procedure 


Suppose A, B, C € R"*" with B upper triangular and consider the compu- 
tation of the matrix multiply update 


D=C+AB (6.2.1) 


on a shared memory computer with p processors. Assume that n = rkp 
and partition the update 


[Di Deo] = [Cn Cep] F [An Ae ][ Bi, Beep] (6.2.2) 
where each block column has width r = n/(kp). If 


By 
B; = Bis o Bem", 
0 
then 
j 
D, = € + AB; = C; + AB. (6.2.3) 


Tml 


The number of flops required to compute D; is given by 


_ 2; 2n? Y, 

Jı = nrj = (s j 

This is an increasing function of j because B is upper triangular. As we 
discovered in the previous section, the wrap mapping is the way to solve 
load imbalance problems that result from triangular matrix structure. This 
suggests that we assign Proc(u) the task of computing D; for j = u:p:kp. 


Algorithm 6.2.1 Suppose A, B, and C are n-by-n matrices that reside 
in a global memory accessible to p processors. Assume that B is upper 
triangular and n = rkp. If each processor executes the following algorithm, 
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then upon completion C is overwritten by D = C + AB. Assume the 
following initializations in each local memory: n, r, k, p and y (the node 
id). 
for j = upkp 
(Compute D;.) 
Bis = B(lr,l-4 (j — rjr) 
Cioe = C(,14 (j - 1)r:jr) 
for r = 1:3 
col = 1-4 (r—A)r:rr 
Atoc = A(:, col) 
Choe = Choe + AjocBioe(cal, :) 
end 
C(,1-4 G — L)rijr) = Choe 
end 
Let us examine the degree of load balancing as a function of the parameter 
k. For Proc(u), the number of flops required is given by 
: KipY 2n3 
F(u) = E = (u + =) Eg 


The quotient F(p)/F(1) is a measure of load balancing from the flop point 
of view. Since 

Fp) _ kp*tkp/2 |, + 20-9 

F(1) k + k?p/2 2+kp 
we see that arithmetic balance improves with increasing k. A similar anal- 
ysis shows that the communication overheads are well balanced as k in- 
creases. 

On the other hand, the total number of global memory reads and writes 
associated with Algorithm 6.2.1 increases with the square of k. If the start- 
up parameter a, in (6.1.5) is large, then performance can degrade with 
increased k. 

The optimum choice for k given these two opposing forces is system 
dependent. If communication is fast, then smaller tasks can be supported 
without penalty and this makes it easier to achieve load balancing. A mul- 
tiprocessor with this attribute supports fine-grained parallelism. However, 
if granularity is too fine in a system with high-performance nodes, then it 
may be impossible for the node programs to perform at level-2 or level-3 
speeds simply because there just is not enough local linear algebra. Again, 
benchmarking is the only way to clarify these issues. 


6.2.2 Torus 


A torus is a two-dimensional processor array in which each row and col- 
umn is a ring. See FIGURE 6.2.1. A Processor td in this context is an 
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ordered pair and each processor has four neighbors. In the displayed exam- 


FIGURE 6.2.1 A Four-by-Four Torus 


ple, Proc(1,3) has west neighbor Proc(1,2), east neighbor Proc(1,4), south 
neighbor Proc(2,3), and north neighbor Proc(4,3). 

To show what it is like to organize a toroidal matrix computation, we 
develop an algorithm for the matrix multiplication D = C + AB where 
A,B,C € IR*^. Assume that the torus is py-by-py and that n = rpi. 
Regard A = (Ai), B = (Biz), and C = (C) as pi-by-pi block matrices 
with r-by-r blocks. Assume that Proc(i, j) contains Ai, Bj, and Ci; and 
that its mission is to overwrite C; with 


m 
Dy = Cy + D> Ae Bay 
k=l 


We develop the general algorithm from the p; = 3 case, displaying the torus 
in cellular form as follows: 
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Let us focus attention on Proc(1,1) and the calculation of 
Dy = Cu + AnBu + ApBa t A13B31 - 


Suppose the six inputs that define this block dot product are positioned 
within the torus as follows: 


(Pay no attention to the "dots." They are later replaced by various Ai; 
and Bj;). 

Our plan is to “ratchet” the first block row of A and the first block 
column of B through Proc(1,1) in a coordinated fashion. The pairs Ai, 
and Bj;, A1; and Bz, and A43 and B3; mset, are multiplied, and added 
into a running sum array Cioc: 


Choe = Cioc + Ai2Bar 


Choe = Choc + Ai3Bai 


Croc = Cioc + Ari Bir 


296 CHAPTER 6. PARALLEL MATRIX COMPUTATIONS 


Thus, after three steps, the local array Coc in Proc(1,1) houses Dia. 

We have organized the flow of data so that the A), migrate westwards 
and the B, migrate northwards through the torus. It is thus apparent that 
Proc(1,1) must execute a node program of the form: 


for t= 1:3 
send( Ai, west) 
send(Bis,, north) 
recv( Asc, east) 
recv(Bigc, south) 
Cioc = Choc + Ajoc Bioc 
end 


The send-recv-send-recv sequence 


for ¢= 1:3 
send(Atoc, west) 
recv(Atoc, east) 
send (Bic, north) 
recv(Bigc, south) 
Cioc = Ctoc + AtocBtoc 
end 


also works. However, this induces unnecessary delays into the process be- 
cause the B submatrix is not sent until the new A submatrix arrives. 

We next consider the activity in Proc(1,2), Proc(1,3), Proc(2,1), and 
Proc(3,1). At this point in the development, these processors merely help 
circulate blocks A11, A12, and A13 and Bu, Ba, and B31, respectively. If 
B32, Bia, and By flowed through Proc(1,2) during these steps, then 


Di; = Cy  AigBaa + Ari Bia + Ai2B2 
could be formed. Likewise, Proc(1,3) could compute 
Dia = Ci Aii Bis + Ai Bas + AisBas 


if Bis, Bag, and Bas are available during ¢ = 1:3. To this end we initialize 
the torus as follows 


With northward flow of the B, we get 
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Thus, if B is mapped onto the torus in a “staggered start” fashion, we can 
arrange for the first row of processors to compute the first row of C. 

If we stagger the second and third rows of A in a similar fashion, then 
we can arrange for all nine processors to perform a multiply-add at each 
step. In particular, if we set 


then with westward flow of the A;; and northward flow of the B;; we obtain 
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From this example we are ready to specify the general algorithm. We 
assume that at the start, Proc(i, j) houses A;;, B;;, and C,;. To obtain the 
necessary staggering of the A data, we note that in processor row i the Ai; 
should be circulated westward i— 1 positions. Likewise, in the jth column 
of processors, the B,; should be circulated northward j —1 positions. This 
gives the following algorithm: 


Algorithm 6.2.2 Suppose A € R"™", B € R°™*, and C € IR" are given 
and that D = C + AB. If each processor in a pi-by-pi torus executes 
the following algorithm and n = pir, then upon completion Proc(j, À) 
houses D,; in local variable Cioc- Assume the following local memory 
initializations: pi, (1, À) (the node id), north, east, south, and west, (the 
four neighbor id's), row = 1 + (uy — l)r:ur, col = 1 + (A — 1)r:Àr, Atoc = 
A(row, col), Biz, = B(row, col), and Cioc = C(row, col). 


{Stagger the A,; and Bà. } 
for k = 1-1 
send( Auc, west); recv(Atoc, east) 


end 
for k=1:A-1 
send(Bigc, north); recv(Bioc, south) 
end 
for k = 1:p1 


Choc = Choe + AjocBroc 
send(Ajoc, west) 
send(Bjoc, north) 
recv(Aige, east) 


recv(Bioc, south) 


6.2. MATRIX MULTIPLICATION 299 


{Unstagger the A,; and Ba.) 


for k= lu —1 

send(Ajoc, east); recv(Ajoc, west) 
end 
for k=1:A-—1 

send( Broc, south); recv(Bjoc, north) 
end 


It is not hard to show that the computation-to-cominunication ratio for 
this algorithm goes to zero as n/p; increases. 


Problems 


P6.2.1 Develop a ring implementation for Algorithm 6.2.1. 


P6.2.2 An upper triangular matrix can be overwritten with its square without any 
edditional workspace. Write a dynamically scheduled, shared-memory procedure for 
doing this. 
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6.3 Factorizations 


In this section we present a pair of parallel Cholesky factorizations. To 
illustrate what a distributed memory factorization looks like, we implement 
the gaxpy Cholesky algorithm on a ring. A shared memory implementation 
of outer product Cholesky is also detailed. 


6.3.1 A Ring Cholesky 


Let us see how the Cholesky factorization procedure can be distributed on 
a ring of p processors. The starting point is the equation 


bol 
G(u,u)G(um,u) = A(un,u) - Y Gui j)G(n j) = vln). 


j=l 


This equation is obtained by equating the uth column in the n-by-n equa- 
tion A = GGT. Once the vector v(u:n) is found then G(y:n, p) is a simple 


scaling: 
G(uin, p) = v(un)/ V v(a). 


For clarity, we first assume that n = p and that Proc(j) initially houses 
A(u:n, u). Upon completion, each processor overwrites its A-column with 
the corresponding G-column. For Proc() this process involves js — 1 saxpy 
updates of the form 


A(pin, p) — Alun, p) — G(u, j)G (un, j) 


followed by a square root and a scaling. The general structure of Proc(y)'s 
node program is therefore as follows: 


for j -1lij—1 
Receive a G-column from the left neighbor. 
If necessary, send a copy of the received G-column to 
the right neighbor. 
Update A(y:n, ps) . 
end 


Generate G(j:n, p) and, if necessary, send it to the 
right neighbor. 


Thus Proc(1) immediately computes G(1:n, 1) = A(1:n, D)/A/ A(1, 1) and 
sends it to Proc(2). Às soon as Proc(2) receives this column it can generate 
G(2:n,2) and pass it to Proc(3) etc.. With this pipelining arrangement we 
can assert that once a processor computes its G-column, it can quit. It 
also follows that each processor receives G-columns in ascending order, i.e., 
G(1:n, 1), G(2:n,2), etc. Based on these observations we have 
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=1 
while j <p 
recv(gioc(j:n), left) 
ifu«cn 
send (gtoc(j:n), right) 


end 
Ajoc(H:n) = Atoc( Hin) — Gtoc(#)g(u:n) 
j=j+1 

end 

oelh: n) = Atoc( un) / V Atoc( H) 

ifucn 

send (Ajoc(u:n), right) 

end 


Note that the number of received G-columns is given by j — 1. Ifj = p, 
then it is time for Proc() to generate and send G(u:n, ys). 

We now extend this strategy to the general n case. There are two obvi- 
ous ways to distribute the computation. We could require each processor 
to compute a contiguous set of G-columns. For example, if n = 11, p = 3, 
and A = [a1,...,01:], then we could distribute A as follows 


[ a1 az a3 a4 | as as 67 as | a9 810911] « 
ee Nee Nee ee” 
Proc(1) Proc(2) Proc(3) 


Each processor could then proceed to find the corresponding G columns. 
The trouble with this approach is that (for example) Proc(1) is idie after 
the fourth column of G is found even though much work remains. 

Greater load belancing results if we distribute the computational tasks 
using the wrap mapping, 1e., 


[ o1 as a7 a1 | 02 as ae ars | 23 26 09] « 
Proc(1) Proc(2) Proc(3) 


In this scheme Proc() carries out the construction of G(:, u;p:n). When 
a given processor finishes computing its G-columns, each of the other pro- 
cessors has at most one more G column to find. Thus if n/p > 1, then all 
of the processors are busy most of the time. 

Let us examine the details of a wrap-distributed Cholesky procedure. 
Each processor maintains a pair of counters. The counter j is the in- 
dex of the next G-column to be received by Proc(u). A processor also 
needs to know the index of the next G-column that it is to produce. Note 
that if col = p:p:n, then Proc(u) is responsible for G(:,col) and that 
L = length(col) is the number of the G-columns that it must compute. 
We use q to indicate the status of G-column production. At any instant, 
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col(q) is the index of the next G-column to be produced. 


Algorithm 6.3.1 Suppose A € R"" is symmetric and positive defi- 
nite and that A = GG? is its Cholesky factorization. If each node in 
8 p-processor ring executes the following program, then upon completion 
Proc(u) bouses G(k:n, k) for k = y:p:n in a local array Ajoc(1:n, D) where 
L = length(col) and col = y:p:n. In particular, G(col(q):n, col(q)) is 
housed in Ajoc(col(g):n, q) for q = 1:L. Assume the following local memory 
initializations: p, ys (the node id), left and right (the neighbor id's), n, and 
Aloe = A(urp-n,:). 


j=1; q= l; col = wp:n; L = length(col) 
while ¢< L 
if j = col(q) 


end 


else 


end 


{ Form G(j:n, 7) } 
Atoe( 47,4) = Atoc(j:n,4)/ V Atoc: d) 
ifj«n 

send( Ats(j:n, q), right) 
end 
j=j+1 
{ Update local columns. } 
fok-2q4LL 

r = col(k) 

Aioc(r:n, k) = Ajoc(rin, k) — Atoe(T, a) Auc(r:n, q) 
end 
4-441 


FECV(Gtoc(7:n), left) 
Compute a, the id of the processor that generated the 
received G-column. 
Compute f, the index of Proc(right)’s final column. 
if right fa Aj<p 
send (gtoc(j:n), right) 
end 
{ Update local columns. } 
for k = q:L 
r — col(k) 
A.loc(r:n, k) = Atoc(rin, k) — gisc(r)gioc(r:n) 
end 
j2j*1 


To illustrate the logic of the pointer system we consider a sample 3-processor 
situation with n = 10. Assume that the three local values of q are 3,2, and 
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2 and that the corresponding values of col(g) are 7, 5, and 6 : 


l l li 
a1 44 dr à G5 aga a: 
[e144 dv ao | Ga ds asan | as de a9] 
Proc(1) Proc(2) Proc(3) 


Proc(2) now generates the fifth G-column and increment its q to 3. 
The decision to pass a received G-column to the right neighbor needs 
to be explained. Two conditions must be fulfilled: 


* The right neighbor must not be the processor which generated the G 


column. This way the circulation of the received G-column is properly 
terminated. 


* The right neighbor must still have more G-columns to generate. Oth- 
erwise, a G-column will be sent to an inactive processor. 
This kind of reasoning is quite typical in distributed memory matrix com- 
putations. 
Let us examine the behavior of Algorithm 6.3.1 under the assumption 
that n > p. It is not hard to show that Proc(u) performs 


n3 
F(u) = Ezan- (H + (k ~1)p))(u + (k — 1)p) = 3p 
k=l 
flops. Each processor receives and sends just about every G-column. Us- 
ing our communication overhead model (6.1.3), we see that the time each 
processor spends communicating is given by 


= S Xaa + Ba(n ~ 7) = Zaan + pan? . 
j=l 


If we assume that computation proceeds at R flops per second, then the 
computation/communication ratio for Algorithm 6.3.1 is approximately 
given by (n/p)(1/3RA,). Thus, communication overheads diminish in im- 
portance as n/p grows. 


6.3.2 A Shared Memory Cholesky 


Next we consider a shared memory implementation of the outer product 
Cholesky algorithm: 


for k =1:n 
A(k:n, k) = Ale: n, k)/ V/ A(k, k) 
for j=k4+1:n 


A(j:n, j) = A(jn, j) ~ AG:n, E) AG k) 
end 
end 
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The j-loop oversees an outer product update. The n — k saxpy operations 
that make up its body are independent and easily parallelized. The scaling 
A(k:n, k) can be carried out by a single processor with no threat to load 
balancing. 


Algorithm 6.3.2 Suppose A € R°™” is a symmetric positive definite 
matrix stored in a shared memory accessible to p processors. If each pro- 
cessor executes the following algorithm, then upon completion the lower 
triangular part of A is overwritten with its Cholesky factor. Assume the 
following initializations.in each local memory: n, p and p (the node id). 
fork = l:n 
ifu-2l 
Uoc(k:n) = A(k:n) 
Yoel iM) = hoe kt Vie 
A(kim, K) = wein) 
end 
barrier 
Uloc(k + lin) = A(k + Lin, k) 
for j = (k + ys):p:n 
Woclj:n) = AG:n, j) 
Woclj:N) = Woclj:N) — Voc(j) Moc (7:7) 
A(j:n, j) = Wtoelj:n) 
end 


barrier 
end 
The scaling before the j-loop represents very little work compared to the 
outer product update and so it is reasonable to assign that portion of the 
computation to a single processor. Notice that two barrier statements are 
required, The first ensures that a processor does not begin working on the 
kth outer product update until the Ath column of G is made available by 
Proc(1). The second barrier prevents the processing of the k--1st step to 
begin until the Ath step is completely finished. 


Problems 


P6.3.1 It is possible to formulate a block version of Algorithm 6.3.1. Suppose n = rN, 
For k = 1:N we (a) have Proc(1) generate G(:, 1 4-(k— L)r:kr) and (b) have all processors 
participate in the rank r update of the trailing submatrix A(kr 4-1:n, kr4-1:n). See §4.2.6. 
The coarser granularity may improve performance if tbe individnal processors like level-3 
operations. 

P8.3.2 Deveiop a shared memory QR factorization patterned after Algorithm 6.3.2. 
Proc(1) should generate the Householder vectors and all processors should share in the 
ensuing Householder update, 
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Having discussed linear equations and least squares, we now direct our 
attention to the third major problem area in matrix computations, the 
algebraic eigenvalue problem. The unsymmetric problem is considered in 
this chapter and the more agreeable symmetric case in the next. 

Our first task is to present the decompositions of Schur and Jordan 
along with the basic properties of eigenvalues and invariant subspaces. The 
contrasting behavior of these two decompositions sets the stage for §7.2 
in which we investigate how the eigenvalues and invariant subspaces of 
a matrix are affected by perturbation. Condition numbers are developed 
that permit estimation of the errors that can be expected to arise because 
of roundoff. 

The key algorithm of the chapter is the justly famous QR algorithm. 
This procedure is the most complex algorithm presented in this book and its 
development is spread over three sections. We derive the basic QR iteration 
in §7.3 as a natural generalization of the simple power method. The next 


Chapter 7 


The Unsymmetric 
Eigenvalue Problem 


$7.1 Properties and Decompositions 

$7.2 Perturbation Theory 

$7.3 Power Iterations 

$7.4 The Hessenberg and Real Schur Forms 
$7.5 The Practical QR Algorithm 

$7.6 Invariant Subspace Computations 

$7.7 The QZ Method for Ax = \Bx 


Having discussed linear equations and least squares, we now direct our 
attention to the third major problem area in matrix computations, the 
algebraic eigenvalue problem. The unsymmetric problem is considered in 
this chapter and the more agreeable symmetric case in the next. 

Our first task is to present the decompositions of Schur and Jordan 
along with the basic properties of eigenvalues and invariant subspaces. The 
contrasting behavior of these two decompositions sets the stage for §7.2 
in which we investigate how the eigenvalues and invariant subspaces of 
a matrix are affected by perturbation. Condition numbers are developed 
that permit estimation of the errors that can be expected to arise because 
of roundoff. 

The key algorithm of the chapter is the justly famous QR algorithm. 
This procedure is the most complex algorithm presented in this book and its 
development is spread over three sections. We derive the basic QR iteration 
in §7.3 as a natural generalization of the simple power method. The next 


308 


309 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM 


two sections are devoted to making this basic iteration computationally 
feasible. This involves the introduction of the Hessenberg decomposition in 
§7.4 and the notion of origin shifts in 87.5. 


The QR algorithm computes the real Schur form of a matrix, a canonical 
form that displays eigenvalues but not eigenvectors. Consequently, addi- 
tional computations usually must be performed if information regarding 
invariant subspaces is desired. In $7.6, which could be subtitled, “What to 
Do after the Real Schur Form is Calculated," we discuss various invariant 
subspace calculations that can follow the QR algorithm. 


Finally, in the last section we consider the generalized eigenvalue prob- 
lem Ax = ABz and a variant of the QR algorithm that has been devised to 
solve it. This algorithm, called the QZ algorithm, underscores the impor- 
tance of orthogonal matrices in the eigenproblem, a central theme of the 
chapter. 


It is appropriate at this time to make a remark about complex versus real 
arithmetic. In this book, we focus on the development of real arithmetic 
algorithms for real matrix problems. This chapter is no exception even 
though a real unsymmetric matrix can have complex eigenvalues. However, 
in the derivation of the practical, real arithmetic QR algorithm and in the 
mathematical analysis of the eigenproblem itself, it is convenient to work 
in the complex field. Thus, the reader will find that we have switched to 
complex notation in $7.1, $7.2, and $7.3. In these sections, we use complex 
versions of the QR factorization, the singular value decomposition, and the 
CS decomposition. 


Before You Begin 


Chapters 1-3 and §§5.1-5.2 are assumed. Within this chapter there are 
the following dependencies: 


$7.1] —^ 872 — 873 — 874 — 875 — 876 — 87.7 


Complementary references include Fox (1964), Wilkinson (1965), Gourlay 
and Watson (1973), Stewart (1973), Hager (1988), Ciarlet (1989), Stewart 
and Sun (1990), Watkins (1991), Saad (1992), Jennings and Mc Keowen 
(1992), Datta (1995), Trefethen and Bau (1997), and Demmel (1996). Some 
Matlab functions important to this chapter are eig, poly, polyeig, hess, 
qz, rsf2csf, cdf2rdf, schur, and balance. LAPACK connections include 
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LAPACK: Unsymmetric Eigenproblem 


- GEBAL Balance transform 
Hessenberg reduction UH AV = H 
U (factored form) times matrix (real case) 
Generates U (real case) 
U (factored form) times matrix (complex case) 
Generates U (complex case) 
Schur decomposition of Hessenberg matrix 
Eigenvectors of Hessenberg matrix by inverse iteration 
Schur decomp of general matrix with e.value ordering 
Same but with condition estimates 
Eigenvalues and left and right eigenvectors of general matrix 
Same but with condition estimates 
Selected eigenvectors of upper quasitriangular matrix 
Cond. estimates of selected eigenvalues of upper quasitriangular matrix 
Unitary reordering of Schur decomposition 
Same but with condition estimates 
Solves AX + XB — C for upper quasitriangular A and B 


-HSEQR 
-HSEIN 


LAPACK: Unsymmetric Generalized Eigenproblem 
Balance transform 
Reduction to Hessenberg- Triangular form 


Generalized Schur decomposition 
Eigenvectors 
Undo balance transform 


7.1 Properties and Decompositions 


In this section we survey the mathematical background necessary to develop 
and analyze the eigenvalue algorithms that follow. 


7.1.1 Eigenvalues and Invariant Subspaces 


The eigenvalues of a matrix A € C"*" are the n roots of its characteristic 
polynomial p(z) = det(zI — A). The set of these roots is called the spectrum 
and is denoted by (A). If A(A) = (21,..., An}, then it follows that 


det(A) = A2 +- Àn 


Moreover, if we define the trace of A by 


tr(A) = » Qi, 


then tr(A) = Ai +-+ A4. This follows by looking at the coefficient of 
2^-! in the characteristic polynomial. 
If À € A(A), then the nonzero vectors x € C” that satisfy 


Ar = Ar 
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are referred to as eigenvectors. More precisely, z is a right eigenvector for À 
if Az = Az and a left eigenvector if x A = AzH. Unless otherwise stated, 
“eigenvector” means “right eigenvector.” 

An eigenvector defines a one-dimensional subspace that is invariant with 
respect to premultiplication by A. More generally, a subspace $ C C” with 
the property that 

zreS=>AreS 


is said to be invariant (for A). Note that if 
AX-XB, Beo@™**, xeon, 


then ran( X) is invariant and By = Ay > A(Xy) = A(Xy). Thus, if X has 
full column rank, then AX = XB implies that A(B) C A(A). If X is square 
and nonsingular, then \(A) = A(B) and we say that A and B = X^ !AX 
are similar. In this context, X is called a similarity transformation. 


7.1.2 Decoupling 


Many eigenvalue computations involve breaking the given problem down 
into a collection of smaller eigenproblems. The following result is the basis 
for these reductions. 


Lemma 7.1.1 If T € ("*” is partitioned as follows, 


Ti Ti2 | Pp 
T= 
| 0 Th | q 


p q 
then A(T) = MTi1) U A(T 22). 
Proof, Suppose 


_ {Tn Tie Til NM Ti 

Ts = [4 T22 T2 =Al z 
where zı € © and z2 € C°. If zo £ 0, then T222 = Ara and so À € 
A(T22). If £2 = 0, then Tiiz; = Az, and so À € A(T11). It follows that 


A(T) C A(T) U A(T22). But since both A(T) and A(T11) U A(T22) have the 
same cardinality, the two sets are equal. O 


7.1.3 The Basic Unitary Decompositions 


By using similarity transformations, it is possible to reduce a given matrix 
to any one of several canonical forms. The canonical forms differ in how 
they display the eigenvalues and in the kind of invariant subspace informa- 
tion that they provide. Because of their numerical stability we begin by 
discussing the reductions that can be achieved with unitary similarity. 
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Lemma 7.1.2 If A € C"*", B € €?*?, and X € C"*? satisfy 
AX — XB, rank(X) = p, (7.1.1) 
then there exists a unitary Q € C" ** such that 
Tu T 
H — _ 11 412 p 
Q AQ =T= | 0 m n—p (7.1.2) 
p n-p 
where ATA) = A(A) n A(B). 
Proof. Let 
x=a| m | Qc C^, Re Or? 


be a QR factorization of X. By substituting this into (7.1.1) and rearrang- 
ing we have 
[m mallo] 7 Lo]? 


Ti T 
H AQ — 11 412 p 
Q AQ | Ta The | n-p 


p n-p 
By using the nonsingularity of E, and the equations T31 Rı = 0 and Ti, Rı = 
RıB, we can conclude that 75, = 0 and A(T\1) = A(B). The conclusion 
now follows because from Lemma 7.1.1 A(A) = A(T) = A(T31)UA(T33). B 


where 


Example 7.1.1 If 


—20.40 95.88 —87.16 
22.80 67.84 12.12 


X = [20, —9, -12]7 and B = [25], then AX = X B. Moreover, if the orthogonal matrix 


Q is defined by 
—.800 .360 480 
Q= .360 .928  —.096 |, 


| 67.00 177.60 mel 
A= ; 


.480 —.096 .872 
then QT X = [-25, 0, 0] and 


25 —90 5 
QTAQ =T= 0 147 -104 
0 146 3 


A calculation shows that A(A) = (25,75 + 100i, 75 — 100i). 


Lemma 7.1.2 says that a matrix can be reduced to block triangular form 
using unitary similarity transformations if we know one of its invariant 
subspaces. By induction we can readily establish the decomposition of 
Schur (1909). 
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Theorem 7.1.3 (Schur Decomposition) If A € C"*", then there exists 
a unitary Q € C"*" such that 


QAQ =T = D+N (7.1.3) 


where D = diag(y,...,An) and N € C"*" is strictly upper triangular. 
Furthermore, Q can be chosen so that the eigenvalues A; appear in any 
order along the diagonal. 


Proof. The theorem obviously holds when n — 1. Suppose it holds for all 
matrices of order n — 1 or less. If Ax = Az, where z # 0, then by Lemma 
7.1.2 (with B = (A)) there exists a unitary U such that: 


By induction there is a unitary U such that ÜP CÜ is upper triangular. 
Thus, if Q = Udiag(1, U), then Q” AQ is upper triangular. 0 


Example 7.1.2 If 


[38 _ [ .8944i 4472 
A= | | and  Q- | —.4472 —.8944i |: 


If Q=[41,---,4n] is a column partitioning of the unitary matrix Q in 
(7.1.3), then the q; are referred to as Schur vectors. By equating columns 
in the equations AQ = QT we see that the Schur vectors satisfy 


k-1 
Aqk = Àkqk + naa: k=1:n. (7.1.4) 


t=1 


From this we conclude that the subspaces 


S, = span{q,...,qx} k-ln 


are invariant. Moreover, it is not hard to show that if Qk = [q,.--,4 |; 
then A(QÏ AQ) = (A1... , Ax}. Since the eigenvalues in (7.1.3) can be ar- 
bitrarily ordered, it follows that there is at least one k-dimensional invariant 
subspace associated with each subset of k eigenvalues. 

Another conclusion to be drawn from (7.1.4) is that the Schur vector qx 
is an eigenvector if and only if the k-th column of N is zero. This turns out 
to be the case for k = l:n whenever A? A = AA”. Matrices that satisfy 
this property are called normal. 
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Corollary 7.1.4 A €C"*” is normal if and only if there exists a unitary 
Q € C"*? such that Q¥ AQ = diag(\y,...,An): 


Proof. It is easy to show that if A is unitarily similar to a diagonal matrix, 
then A is normal. On the other hand, if A is normal and QP AQ = T is 
its Schur decomposition, then T is also normal. The corollary follows by 
showing that a normal, upper triangular matrix is diagonal. O 


Note that if QV AQ = T = diag(A;) + N is a Schur decomposition of a 
general n-by-n matrix A, then || N ||p is independent of the choice of Q: 


n 
IN ie = lAl- So)? = AA). 
i=1 


This quantity is referred to as A’s departure from normality. Thus, to 
make T “more diagonal,” it is necessary to rely on nonunitary similarity 
transformations. 


7.1.4 Nonunitary Reductions 


To see what is involved in nonunitary similarity reduction, we examine the 
block diagonalization of a 2-by-2 block triangular matrix. 


Lemma 7.1.5 Let T € ("*” be partitioned as follows: 


Tu Tie | p 
T= 
| 0 Th} q 
P q 
Define the linear transformation $:(?** — (P*? by 


$(X) = TuX - XT 


where X € €?*?, Then ¢ is nonsingular if and only if (T11) NA(T22) = 9. 
If @ is nonsingular and Y is defined by 


I, Z 
v-|52] «n--n 
then Y !TY = diag(T,1, T22). 


Proof. Suppose ¢(X) = 0 for X Z 0 and that 


x 0 T 
H — r 
uxv= |} 0 | 
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is the SVD of X with ©, = diag(ci), r = rank(X). Substituting this into 
the equation T11.X = XT 2 gives 


Al A12 X. 0 _ x, 0 Bu Biz 
Aa A22 00| | 0 0 Bà B22 
where UP T3U = (Ajj) and VP T3V = (By). By comparing blocks we see 
that Agi = 0, Big = 0, and A(Aj,) = A(B11). Consequently, 
OF A(Ai1) = (B11) c A(T) n A(T23). 


On the other hand, if A € A(T11) N A(T22) then we have nonzero vectors z 
and y so Tiiz = Az and yP To. = Ay”. A calculation shows that ¢(ry” ) 
= 0. Finally, if ¢ is nonsingular then the matrix Z above exists and 


-1 _ I -Z Ti The I Z 
Y TY = | 0 I 0 The 0 I 
i Tn TuZ-ZTa*T| _ | Tu 0 
~ 0 To 0 T52 ` 


Example 7.1.3 If 


1 2 3 10 05 -05 
T= 0 3 8 and Y — 0.0 1.0 0.0 
0 -2 3 0.0 0.0 1.0 


then 


By repeatedly applying Lemma 7.1.5, we can establish the following more 
general result: 


Theorem 7.1.6 (Block Diagonal Decomposition) Suppose 


Ti The © Tig 
0 Ta = T, 

QAQ =T=] . . s (7.1.5) 
0 0 +: Tq 


is a Schur decomposition of A € C"*" and assume that the T;; are square. 
If (T4) X(T5;) = 0 whenever i £ j, then there exists a nonsingular matriz 
Y € €"*7 such that 


(QY)-!A(QY) = diag(Ti1, eT | Tao). (7.1.6) 
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Proof. A proof can be obtained by using Lemma 7.1.5 and induction. O 


If each diagonal block T;; is associated with a distinct eigenvalue, then we 
obtain 


Corollary 7.1.7 If A € C"*" then there exists a nonsingular X such that 
X !AX = dagDQuI + Ni,- A +N)  Niee™*™ — (7.1.7) 


where À1,..., Ag are distinct, the integers ni,..., nq satisfyni +: +n = 
n, and each N; is strictly upper triangular. 


A number of important terms are connected with decomposition (7.1.7). 
The integer n; is referred to as the algebraic multiplicity of A;. If n; = 1, 
then A; is said to be simple . The geometric multiplicity of A; equals the 
dimensions of null( N;), i.e., the number of linearly independent eigenvectors 
associated with A;. If the algebraic multiplicity of À; exceeds its geometric 
multiplicity, then A; is said to be a defective eigenvalue. A matrix with 
a defective eigenvalue is referred to as a defective matrir. Nondefective 
matrices are also said to be diagonalizable in light of the following result: 


Corollary 7.1.8 (Diagonal Form) A € €"*" is nondefective if and only 
if there exists a nonsingular X € C" *" such that 


X^!AX = diag(Ai,..., Àn) (7.1.8) 


Proof. A is nondefective if and only if there exist independent vectors 
Z1...En € C" and scalars A1,...,An such that Az; = Aiz; for i = l:n. This 
is equivalent to the existence of a nonsingular X = [2,...,2,] € €"*^ 
such that AX = XD where D = diag(\i,.-.,An). D 


Note that if y/ is the ith row of X~!, then y¥ A = A,yF. Thus, the columns 
of X-T are left eigenvectors and the columns of X are right eigenvectors. 


Example 7.1.4 If 


5 -1 1 1 
A-|D.$ s] e x=|[7 2] 


then X71 AX = diag(4, 7). 
If we partition the matrix .X in (7.1.7), 


X= [X X ] 
nı Nq 


then C” = ran(X1) ®... @ran(X,), a direct sum of invariant subspaces. If 
the bases for these subspaces are chosen in a special way, then it is possible 
to introduce even more zeroes into the upper triangular portion of X! AX. 


7.1. PROPERTIES AND DECOMPOSITIONS 317 


Theorem 7.1.9 (Jordan Decomposition) If A € ©"*”, then there ez- 
ists a nonsingular X € C"*" such that X-! AX = diag(J1,..., Ji) where 


A; 1 e 0 
0 A : 
Ji = "n. 
us] 
0 0 Ài 


is mi-by-m; and mi t-----m, =n. 
Proof. See Halmos (1958, pp. 112 ff.) B 


The J; are referred to as Jordan blocks . The number and dimensions of the 
Jordan blocks associated with each distinct eigenvalue is unique, although 
their ordering along the diagonal is not. 


7.1.5 Some Comments on Nonunitary Similarity 


The Jordan block structure of a defective matrix is difficult to determine 
numerically. The set of n-by-n diagonalizable matrices is dense in ("*”, 
and thus, small changes in a defective matrix can radically alter its Jordan 
form. We have more to say about this in §7.6.5. 

A related difficulty that arises in the eigenvalue problem is that a nearly 
defective matrix can have a poorly conditioned matrix of eigenvectors. For 
example, any matrix X that diagonalizes 


_|ite 1 
A= | 0 re 0<e<l (7.1.9) 


has a 2-norm condition of order 1/e. 


These observations serve to highlight the difficulties associated with ill- 
conditioned similarity transformations. Since 


fX-!AX) = X-!AX +E, (7.1.10) 


where 
| Ela =~ uK2(X)|| A |l (7.1.11) 


is it clear that large errors can be introduced into an eigenvalue calculation 
when we depart from unitary similarity. 
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7.1.6 Singular Values and Eigenvalues 


Since the singular values of A and its Schur decomposition Q¥ AQ = 
diag(A;) + N are the same, it follows that 


Omin(A) < min IA < max Id < Omaz(A). 
i i 


From what we know about the condition of triangular matrices, it may be 
the case that 
A: 
max Ds « k3(A). 
ij AG 
This is a reminder that for nonnormal matrices, eigenvalues do not have the 
“predictive power" of singular values when it comes to Az = b sensitivity 


matters. Eigenvalues of nonnormal matrices have other shortcomings. See 
§11.3.4. 


Problems 


P7.1.1 Show that if T € ("*" is upper triangular and normal, then T is diagonal. 
P7.1.2 Verify that if X diagonalizes the 2-by-2 matrix in (7.1.9) and e < 1/2 then 
«1(X) > l/e. 

P7.1.3 Suppose A € (^*^ has distinct eigenvalues. Show that if QH AQ = T is its 
Schur decomposition and AB = BA, then QĦ BQ is upper triangular. 


P7.1.4 Show that if A and BH are in (X? with m > n, then: 
X(AB) = XBA)U(0,...,0). 
—— 


mon 


P7.1.5 Given A € E”, use the Schur decomposition to show that for every € > 0, 
there exists a diagonalizable matrix B such that || A — B || < €. This shows that the set 


of diagonalizable matrices is dense in "^ and that the Jordan canonical form is not 
a continuous matrix decomposition. 


P7.1.6 Suppose A, — A and that QH AkQk = TX is a Schur decomposition of Ax. 
Show that {Qx} has a converging subsequence (Q,) with the property that 


lim Qi, 2 Q 


í—oo 


where QH AQ = T is upper triangular. This shows that the eigenvalues of a matrix are 
continuous functions of its entries. 


P7.1.7 Justify (7.1.10) and (7.1.11). 
P7.1.8 . Show how to compute the eigenvalues of 


_ [A €] k 
m= 1$ 5] 3 
kj 


where A, B, C, and D are given real diagonal matrices. 
P7.1.9 Use the JCF to show that if all the eigenvalues of a matrix A are strictly less 
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than unity, then lim~o A* = 0. 
P7.1.10 The initial value problem 
&(t) vit) — z(0-1 
u(t) -z(t) y(0) — 0 
has solution z(t) = cos(t) and y(t) = sin(t). Let A > 0. Here are three reasonable 


iterations that can be used to compute approximations rj % r(kh) and yy œ y(kh) 
assuming that tp = 1 and yj = 0: 


Method 1; Zk+ = 1+hyk 

Yet. = l—hzk 

Method 2; Zk+ = 1+ hur 
Yk+1 = 1L-Argyi 
Method 3;  ?&&1 = Lt Aves 
kt1 =  l—hzky 


Express each method in the form 


Bos | -A[2] 
Yk+1 Vk 
where Ap is a 2-by-2 matrix. For each case, compute A(A;,) and use the previous problem 


to discuss lim z and lim yj as k — oo. 


P7.1.11 If J € R¢*4 is a Jordan block, what is &oo (J)? 
P7.1.12 Show that if 
R= | Ri Riz | p 


0 Rag q 
p q 
is normal and A(R11) N A(R22) = 0, then Riz = 0. 


Notes and References for Sec. 7.1 


The mathematical properties of the algebraic eigenvalue problem are elegantly covered in 
Wilkinson (1965, chapter 1) and Stewart (1973, chapter 6). For those who need further 
review we also recommend 


R. Bellman (1970). Introduction to Matriz Analysis, 2nd ed., McGraw-Hill, New York. 

LC. Gohberg, P. Lancaster, and L. Rodman (1986). Invariant Subspaces of Matrices 
With Applications, John Wiley and Sons, New York. 

M. Marcus and H. Minc (1964). A Survey of Matriz Theory and Matriz Inequalities, 
Allyn and Bacon, Boston. 

L. Mirsky (1963). An Introduction to Linear Algebra , Oxford University Press, Oxford. 


The Schur decomposition originally appeared in 


L Schur (1909). “On the Characteristic Roots of a Linear Substitution with an Appli- 
cation to the Theory of Integral Equations." Math. Ann. 66, 488-510 (German). 


A proof very similar to ours is given on page 105 of 


H.W. Turnbull and A.C. Aitken (1961). An Introduction to the Theory of Canonical 
Forms, Dover, New York. 
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Connections between singular values, eigenvalues, and pseudoeigenvalues (see §11.3.4) 
are discussed in 


K-C. Toh and L.N. Trefethen (1994). “Pseudozeros of Polynomials and Pseudospectra 
of Companion Matrices," Numer. Math. 68, 403-425. 

F. Kittaneh (1995). “Singular Values of Companion Matrices and Bounds on Zeros of 
Polynomials,” SIAM J. Matriz Anal. Appl. 16, 333-340. 


7.2 Perturbation Theory 


The act of computing eigenvalues is the act of computing zeros of the char- 
acteristic polynomial. Galois theory tells us that such a process has to be 
iterative if n > 4 and so errors will arise because of finite termination. In 
order to develop intelligent stopping criteria we need an informative per- 
turbation theory that tells us how to think about approximate eigenvalues 
and invariant subspaces. 


7.2.1 Eigenvalue Sensitivity 


Several eigenvalue routines produce a sequence of similarity transformations 
Xy with the property that the matrices X, 1 AX; are progressively “more 
diagonal.” The question naturally arises, how well do the diagonal elements 
of a matrix approximate its eigenvalues? 


Theorem 7.2.1 (Gershgorin Circle Theorem) If X AX = D+F 
where D = diag(d,,...,d,) and F has zero diagonal entries, then 


(A) € Un 


n 
where Dj = (ze€:|z—d;| € 3 fl 
j=l 


Proof. Suppose A € A(A) and assume without loss of generality that A 4 d; 
for i = 1:n. Since (D — AI) + F is singular, it follows from Lemma 2.3.3 
that 


1€ (D -ADF lle = Di PE 


for some k, 1 € k € n. But this implies that À € Dy. O 


It can also be shown that if the Gershgorin disk D; is isolated from the other 
disks, then it contains precisely one of A's eigenvalues. See Wilkinson (1965, 
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pp.71ff.). 


Example 7.2.1 If 


then A(A) ~ (10.226, .3870 + 2.22161, .3870 — 2.22161) and the Gershgorin disks are 
Dı = { |z|: |z — 10| < 5}, D2 = { |z| : |z| < 3}, and D3 = { |z| : |z — 1| < 3}. 


For some very important eigenvalue routines it is possible to show that the 
computed eigenvalues are the exact eigenvalues of a matrix A + E where E 
is small in norm. Consequently, we must understand how the eigenvalues 
of a matrix can be affected by small perturbations. A sample result that 
sheds light on this issue is the following. theorem. 


Theorem 7.2.2 (Bauer-Fike) If p is an eigenvalue of A+ E e (^*^ 
and X^! AX = D = diag(à1,..., A4), then 


min |A — p| < Kp(X)|| E lp 
A€A(A) 


where || - ||, denotes any of the p-norms. 


Proof. We need only consider the case when p is not in A(A). If the matrix 
X- (A4 E — uI)X is singular, then so is J + (D — uI) ! (X^! EX). Thus, 
from Lemma 2.3.3 we obtain 


1 € || (D-a (X^ EX) |p € ll (D - #1) * lll X lll E lll X7? I, 


Since (D — uI)*! is diagonal and the p-norm of a diagonal matrix is the 
absolute value of the largest diagonal entry, it follows that 


D-yul)!|, = min —— 
I d» aeaa) |^ — ul 


from which the theorem follows. O 


An analogous result can be obtained via the Schur decomposition: 


Theorem 7.2.3 Let QV AQ = D--N be a Schur decomposition of A c €^*? 
as in (7.1.3). If p € (A + E) and p is the smallest positive integer such 
that |N|? = 0, then 


min |A—p| € max(0, 0!/P) 
A€X(A) 


where 


p-1 
8 = |El > IN Iž- 
k=0 
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Proof. Define 
1 


= min |A-p = =r. 
NEX(A) A - al ll (ul — D)~* lla 


The theorem is clearly true if 6 = 0. If 6 > 0 then I — (uJ — A)! E is 
singular and by Lemma 2.3.3 we have 


1 II-A Ell < || (7-A? lll Ell (7-2-1) 
I| (ur ~ D) - N) * |lall Ella. 


Since (uJ — D)! is diagonal and |N|? = 0 it is not hard to show that 
((uI — D)~1N)? = 0. Thus, 


I^ 


p-1 


((uI ~ D)- N) = Y (ut - D)! N) (uf - D)? 
k=0 
and so . 
a 1 C (IN Mb 
I-D) -Nh <3 È (Re ) . 
If 6 > 1 then 


1 
l(uZ-D)- N)*ls < 3 IN I 
k=0 


and so from (7.2.1), 6 < 0. If 6 < 1 then 
1 ak 
IG - D) -Mle < Fe TINT 


and so from (7.2.1), 6? < 0. Thus, 6 < max(0,0!/»). B 


Example 7.2.2 If 


12 3 0 00 
A-|04 5 and E- 0 ool], 
0 O0 4.001 001 0 0 


then (A + E) z (1.0001, 4.0582, 3.9427} and A's matrix of eigenvectors satisfies 
K2(X) & 107. The Bauer-Fike bound in Theorem 7.2.2 has order 10*, while the Schur 
bound in Theorem 7.2.3 has order 10°. 


Theorems 7.2.2 and 7.2.3 each indicate potential eigenvalue sensitivity if A 
is nonnormal. Specifically, if &2(.X) or || N ||] ! is large, then small changes 
in A can induce large changes in the eigenvalues. 


Example 7.2.3 If 
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then for all à € A(A) and p € A(A + E), |À ~ p| = 1071. In this example a change of 
order 1071? in A results in a change of order 10-1 in its eigenvalues. 


7.2.2 The Condition of a Simple Eigenvalue 


Extreme eigenvalue sensitivity for a matrix A cannot occur if A is normal. 
On the other hand, nonnormality does not necessarily imply eigenvalue sen- 
sitivity. Indeed, a nonnormal matrix can have a mixture of well-conditioned 
and ill-conditioned eigenvalues. For this reason, it is beneficial to refine our 
perturbation theory so that it is applicable to individual eigenvalues and 
not the spectrum as a whole. 

To this end, suppose that A is a simple eigenvalue of A € €C"*^ and 
that z and y satisfy Az = Az and y4 A = AyP with || z || = || y lo = 1. 
If YF AX = J is the Jordan decomposition with Y = X~}, then y and 
z are nonzero multiples of X(:,2) and Y (:,i) for some i. It follows from 
1 = Y(:, i) X(:,i) that yz # 0, a fact that we shall use shortly. 

Using classical results from function theory, it can be shown that in a 
neighborhood of the origin there exist differentiable r(c) and À(€) such that 


(A+eF)r(e) -Meaz() — lFla-1 


where A(0) = A and z(0) = z. By differentiating this equation with respect 
to € and setting e = 0 in the result, we obtain 


Ai(0)-- Fz = À(0)z + Aż(0). 
Applying y” to both sides of this equation, dividing by yz, and taking 
absolute values gives 
1 
T dyüzpD 
The upper bound is attained if F = yz”. For this reason we refer to the 
reciprocal of 


yP Fr 
yz 


Aol = | 


s() = [y"z| 
as the condition of the eigenvalue A. 

Roughly speaking, the above analysis shows that if order e perturbations 
are made in A, then an eigenvalue A may be perturbed by an amount 
e/s(X). Thus, if s(A) is small, then A is appropriately regarded as ill- 
conditioned. Note that s(A) is the cosine of the angle between the left and 
right eigenvectors associated with A and is unique only if A is simple. 

A small s(A) implies that A is near a matrix having a multiple eigen- 
value. In particular, if A is distinct and 5(A) < 1, then there exists an E 
such that A is a repeated eigenvalue of A + E and 


|| E lla s(X) 


lAl ^ y1- sQ)? 
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This result is proved in Wilkinson (1972). 


Example 7.2.4 If 


1 2 3 0 0 0 
A= 0 4 5 and E= 0 0 0f, 
0 0 4.001 001 0 0 


then A(A + E) z (1.0001, 4.0582, 3.9427) and s(1) = .8 x 10°, s(4) = .2 x 1073, and 
3(4.001) ~ .2 x 1073. Observe that || E ||2/s(A) is a good estimate of the perturbation 
that each eigenvalue undergoes. 


7.2.8 Sensitivity of Repeated Eigenvalues 


If A is a repeated eigenvalue, then the eigenvalue sensitivity question is 
more complicated. For example, if 


4-|ii| and r= | 


= © 
oo 
uy 


then A(A + eF) = {1+ ea}. Note that if a £ 0, then it follows that the 
eigenvalues of A + eF are not differentiable at zero; their rate of change at 
the origin is infinite. In general, if À is a defective eigenvalue of A, then 
O(e) perturbations in A can result in O(c!/P) perturbations in A if A is 
associated with a p-dimensional Jordan block. See Wilkinson (1965, pp. 
77ff.) for a more detailed discussion. 


7.2.4 Invariant Subspace Sensitivity 


A collection of sensitive eigenvectors can define an insensitive invariant 
subspace provided the corresponding cluster of eigenvalues is isolated. To 
be precise, suppose 


n—r (7.2.2) 


is a Schur decomposition of A with 


Q-[Q Qj (7.2.3) 
r n-r 

It is clear from our discussion of eigenvector perturbation that the sensi- 

tivity of the invariant subspace ran(Q1) depends on the distance between 

A(T11) and A(T22). The proper measure of this distance turns out to be 

the smallest singular value of the linear transformation X — T,X — XT35. 
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(Recall that this transformation figures in Lemma 7.1.5.) In particular, if 
we define the separation between the matrices Tjj and T22 by 


: TuX — XT. 
sep(T11, T22) = min arale, (7.2.4) 
F 


then we have the following general result: 


Theorem 7.2.4 Suppose that (7.2.2) and (7.2.3) hold and that for any 
matriz E € C"** we partition QU EQ as follows: 


H E En Fria T 
Q"EQ = | Eo E22 | n-r 
T n-—r 


If sep(Tii, T22) » 0 and 


517! T3,T: 
ILE ll ( 4 ll Tie lle ) < sep(Tii 2) 


sep(Tii, T22) 5 
then there exists a P € COTO" with 

| E21 lla 
sep(T11, T22) 


such that the columns of Q1 = (Q1 +Q2P)(I + PP P)-!/? are an orthonor- 
mal basis for a subspace invariant for A+ E. 


| Pl S 4 


Proof. This result is a slight recasting of Theorem 4.11 in Stewart (1973) 
which should be consulted for proof details. See also Stewart and Sun 
(1990, p.230). The matrix (I + PF P)~1/? is the inverse of the square root 
of the symmetric positive definite matrix J + PP P. See $4.2.10. D 


Corollary 7.2.5 If the assumptions in Theorem 7.2.4 hold, then 


|| Balla 
sep(T11, T22) 
Proof. Using the SVD of P, it can be shown that 


| PU + PF P)? |l, < || Plo (7.2.5) 


dist(ran(Q1), ran(Q1)) <4 


The corollary follows because the required distance is the norm of Q Qi = 
P(I + PH p)-1/2, oO 


Thus, the reciprocal of sep(T11, T22) can be thought of as a condition num- 
ber that measures the sensitivity of ran(Q1) as an invariant subspace. 


Example 7.2.5 Suppose 


3 10 0 —20 [ain 
mafo S] me [a sm] wa me [1 1] 
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and that 


op | Tm Ti2 
a=zr=-|% ma. 


Observe that AQ: = QiTi; where Qı = [e1,e2] € RÉ*?.. A calculation shows that 
sep(Ti1, T22) © .0003. If 
En = 107° | 1 1] 


1 1 
and we examine the Schur decomposition of 


then we find that Qi gets perturbed to 


—.9999 —.0003 
"NN 0003  —.9999 
Qı = | 9005 —.0026 

.0000 —— .0003 


Thus, we have dist(ran(Q1), ran(Q1)) ~ .0027 ~ 107 9 /sep(T11, T22). 


7.2.5 | Eigenvector Sensitivity 


If we set r — 1 in the preceding subsection, then the analysis addresses the 
issue of eigenvector sensitivity. 


Corollary 7.2.6 Suppose A, E € €"*" and that Q = [qı Q2] € C"*” is 
unitary with q1 € C". Assume 


n-1 n-1 
1n-1 1-1 


A vH 1 e wt 1 


(Thus, qı is an eigenvector.) If o = Omin(T22 — AI) > 0 and 
5|l v lle c 
1 — < — 
Il E lz ( +") Se 


then there exists p € C^! with 
Il é ll 
Ip, < ele 


such that à = (qı +Q2p)/ V1 + p” p is a unit 2-norm eigenvector for A+E. 
Moreover, 


dist(span(q1), span(j1)) < ila. 


Proof. The result follows from Theorem 7.2.4, Corollary 7.2.5 and the 
observation that if Tj; = A, then sep(Ti1, T22) = Omin(T22 — AI). E) 
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Note that Omin(T22 — AI) roughly measures the separation of À from the 
eigenvalues of 7553. We have to say "roughly" because 


sep(À, T22) = Gmin(To2 - AI) € min |n — AI 
B €A(T23) 


and the upper bound can be a gross overestimate. 

That the separation of the eigenvalues should have a bearing upon eigen- 
vector sensitivity should come as no surprise. Indeed, if À is a nondefective, 
repeated eigenvalue, then there are an infinite number of possible eigen- 
vector bases for the associated invariant subspace. The preceding analysis 
merely indicates that this indeterminancy begins to be felt as the eigen- 
values coalesce. In other words, the eigenvectors associated with nearby 
eigenvalues are “wobbly.” 


Example 7.2.6 If 
Aa [10 00 
^" Ll 0.00 0.99 


then the eigenvalue A = .99 has condition 1/s(.99) ~ 1.118 and associated eigenvector 
z = [.4472, —.8944]7. On the other hand, the eigenvalue À = 1.00 of the “nearby” matrix 


A+E = | ooo Loo | 


0.00 1.00 


has an eigenvector $ = [.7071, —.7071]7. 


Problems 


P7.2.1 Suppose Q? AQ = diag(A1) + N is a Schur decomposition of A € (^*^ and 
define v(A) = || ABA — AAP ||... The upper and lower bounds in 


v(AY? n?-—n 
(A «INIRE < J Bw) 
6|| A IF 12 


are established by Henrici (1962) and Eberlein (1965), respectively. Verify these results 
for the case n — 2. 


P7.2.2 Suppose A € ("*" and X-! AX = diag(A1,...,An) with distinct Aj. Show 
that if the columns of X have unit 2-norm, then xp (X)? = n3 a (1/8(Ax))? 

P7.2.3 Suppose QH AQ = diag(A;) + N is a Schur decomposition of A and that X^! AX 
= diag (Aj). Show «2(X)? > 1 + (Il N Ilp/Il A llp)?. See Loizou (1969). 

P7.2.4 If X-! AX = diag (Ax) and |A1| > ++ > [Anl], then 


oi(A) . 
aX) € [Ail € *2(X)oi(A) . 


Prove this result for the n — 2 case. See Ruhe (1975). 


P7.2.5 Show that if A = | 0 i | and a £ b, then s(a) = s(b) = (1--|c/(a —6)|2) 1/2. 
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P7.2.6 Suppose 


Tà aT 
a=|4 Taa | 


and that A € A(T22). Show that if o = sep(A, T22), then 


1 < o 
1+ ll (Ta2— AT)“ v 3} fo? + |v [13 


P7.2.7 Show that the condition of a simple eigenvalue is preserved under unitary 


s(A) = 


similarity transformations. 


P7.2.8 With the same hypothesis as in the Bauer-Fike theorem (Theorem 7.2.2), show 


that min |A—4| < IXE IXI Il. 
EAA) 


P7.2.9 Verify (7.2.5). 


P7.2.10 Show that if B € C™*™ and C € C"™", then sep(B, C) is less than or equal 
to [À — u| for all A € A(B) and p e X(C). 


Notes and References for Sec. 7.2 


Many of the results presented in this section may be found in Wilkinson (1965, chapter 
2), Stewart and Sun (1990) as well as in 


F.L. Bauer and C.T. Fike (1960). *Norms and Exclusion Theorems," Numer. Math. 2, 
123-44. 

A.S. Householder (1964). The Theory of Matrices in Numerical Analysis. Blaisdell, 
New York. 


The following papers are concerned with the effect of perturbations on the eigenvalues 
of a general matrix: 


A. Ruhe (1970). “Perturbation Bounds for Means of Eigenvalues and Invariant Sub- 
spaces,” BIT 10, 343-54. 

A. Ruhe (1970). “Properties of a Matrix with a Very Ill-Conditioned Eigenproblem," 
Numer. Math. 15, 57-60. 

J.H. Wilkinson (1972). “Note on Matrices with a Very Ill-Conditioned Eigenproblem," 
Numer. Math. 19, 176-78. 

W. Kahan, B.N. Parlett, and E. Jiang (1982). “Residual Bounds on Approximate Eigen- 
systems of Nonnormal Matrices,” SIAM J. Numer. Anal. 19, 470-484, 

J.H. Wilkinson (1984). “On Neighboring Matrices with Quadratic Elementary Divisors,” 
Numer. Math. 44, 1-21. 

J.V. Burke and M.L. Overton (1992). “Stable Perturbations of Nonsymmetric Matrices,” 
Lin. Alg. and Its Application 171, 249-273. 


Wilkinson’s work on nearest defective matrices is typical of a growing body of literature 
that is concerned with “nearness” problems. See 


N.J. Higham (1985). “Nearness Problems in Numerical Linear Algebra,” PhD Thesis, 
University of Manchester, England. 

C. Van Loan (1985). “How Near is a Stable Matrix to an Unstable Matrix?,” Contem- 
porary Mathematics, Vol. 47, 465-477. 

J.W. Demmel (1987). “On the Distance to the Nearest Ill-Posed Problem,” Numer. 
Math. 51, 251-289. 
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J.W. Demmel (1987). “A Counterexample for two Conjectures About Stability,” IEEE 
Trans. Auto. Cont. AC-32, 340-342. 


A. Ruhe (1987). “Closest Normal Matrix Found!,” BIT 27, 585-598. 

R. Byers (1988). “A Bisection Method for Measuring the Distance of a Stable Matrix to 
the Unstable Matrices,” SIAM J. Sci. and Stat. Comp. 9, 875-881. 

J.W. Demmel (1988). “The Probability that a Numerical Analysis Problem is Difficult,” 
Math. Comp. 50, 449—480. 

N.J. Higham (1989). "Matrix Nearness Problems and Applications," in Applications of 
Matriz Theory, M.J.C. Gover and S. Barnett (eds), Oxford University Press, Oxford 
UK, 1-27. 


Aspects of eigenvalue condition are discussed in 


C. Van Loan (1987). “On Estimating the Condition of Eigenvalues and Eigenvectors," 
Lin. Alg. and Its Applic. 88/89, 715-732. 

C.D. Meyer and G.W. Stewart (1988). “Derivatives and Perturbations of Eigenvectors,” 
SIAM J. Num. Anal. 25, 679-691. 

G.W. Stewart and G. Zhang (1991). “Eigenvalues of Graded Matrices and the Condition 
Numbers of Multiple Eigenvalues,” Numer. Math. 58, 703-712. 

J.-G. Sun (1992). “On Condition Numbers of a Nondefective Multiple Eigenvalue,” 
Numer. Math. 61, 265-276. 


The relationship between the eigenvalue condition number, the departure from normal- 
ity, and the condition of the eigenvector matrix is discussed in 


P. Henrici (1962). “Bounds for Iterates, Inverses, Spectral Variation and Fields of Values 
of Non-normal Matrices," Numer. Math. 4, 24-40. 

P. Eberlein (1965). *On Measures of Non-Normality for Matrices," Amer. Math. Soc. 
Monthly 72, 995-96. 

R.A. Smith (1967). “The Condition Numbers of the Matrix Eigenvalue Problem," Nu- 
mer. Math. 10 232-40. 

G. Loizou (1969). *Nonnormality and Jordan Condition Numbers of Matrices," J. ACM 
16, 580-40. 

A. van der Sluis (1975). “Perturbations of Eigenvalues of Non-normal Matrices,” Comm. 
ACM 18, 30-36. 


The paper by Henrici also contains a result similar to Theorem 7.2.3. Penetrating treat- 
ments of invariant subspace perturbation include 


T. Kato (1966). Perturbation Theory for Linear Operators, Springer-Verlag, New York. 

C. Davis and W.M. Kahan (1970). “The Rotation of Eigenvectors by a Perturbation, 
III,” SIAM J. Num. Anal. 7, 1-46. 

G.W. Stewart (1971). “Error Bounds for Approximate Invariant Subspaces of Closed 
Linear Operators,” SIAM. J. Num. Anal. 8, 796-808. 

G.W. Stewart (1973). “Error and Perturbation Bounds for Subspaces Associated with 
Certain Eigenvalue Problems,” SIAM Review 15, 727-64. 


Detailed analyses of the function sep(.,.) and the map X — AX + X AT are given in 


J. Varah (1979). “On the Separation of Two Matrices,” SIAM J. Num. Anal. 16, 
216-22. 

R. Byers and S.G. Nash (1987). “On the Singular Vectors of the Lyapunov Operator," 
SIAM J. Alg. and Disc. Methods 8, 59-66. 


Gershgorin’s Theorem can be used to derive a comprehensive perturbation theory. See 
Wilkinson (1965, chapter 2). The theorem itself can be generalized and extended in 
various ways; see 


330 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM 


R.S. Varga (1970). “Minimal Gershgorin Sets for Partitioned Matrices,” SIAM J. Num. 
Anal. 7, 493-507. 


R.J. Johnston (1971). “Gershgorin Theorems for Partitioned Matrices,” Lin. Alg. and 
Its Applic. 4, 205-20. 


7.3 Power Iterations 


Suppose that we are given A € ©"*” and a unitary Up € €C"*". Assume 
that Householder orthogonalization (Algorithm 5.2.1) can be extended to 
complex matrices (it can) and consider the following iteration: 


To = Ud! AU, 

for k -1,2,... 
Tk-1 = U,Ry (QR factorization) (7.3.1) 
Tk =-RkUk 

end 


Since Tk = RU, = UF(U,R,)U, = U? T, iU, it follows by induction 
that 


Ty = (UgUj ---U,)P A(UoU -+ Ux). (7.3.2) 


Thus, each T, is unitarily similar to A. Not so obvious, and what is the 
central theme of this section, is that the T4 almost always converge to 
upper triangular form. That is, (7.3.2) almost always “converges” to a 
Schur decomposition of A. 

Iteration (7.3.1) is called the QR iteration, and it forms the backbone 
of the most effective algorithm for computing the Schur decomposition. 
In order to motivate the method and to derive its convergence properties, 
two other eigenvalue iterations that are important in their own right are 
presented first: the power method and the method of orthogonal iteration. 


7.3.1 The Power Method 


Suppose A € ©” is diagonalizable, that X^! AX = diag(Ai,...,An) with 
X = [21,...,2n], and |à] > |A2| > --- > [An]. Given a unit 2-norm 
gq) € C^, the power method produces a sequence of vectors qU as follows: 


for k = 1,2;... 
2k) = Aq(k-U 
q(X = 209 || 209 | (7.3.3) 
AG) = [gE Aq() 

end 


There is nothing special about doing a 2-norm normalization except that 
it imparts a greater unity on the overall discussion in this section. 
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Let us examine the convergence properties of the power iteration. If 
g = 412, + a22 + -+ ann 


and a, Æ 0, then it follows that 
n a; X k 
Since qF) € span( A*q(? ) we conclude that 


dist (span{q‘*)}, span(z;)) =O ( 


) 
If || > |A2| > --- > [An] then we say that Aj is a dominant eigenvalue. 
Thus, the power method converges if Àj is dominant and if q% has a 
component in the direction of the corresponding dominant eigenvector rj. 


The behavior of the iteration without these assumptions is discussed in 
Wilkinson (1965, p.570) and Parlett and Poole (1973). 


and moreover, 
À2 


À 


[Ay - AM) | = o( 


Example 7.3.1 If 


—261 209 —49 
A= -—530 422 —98 
-800 631  —144 
then A(A) = (10, 4, 3}. Applying (7.3.3) with q© = [1, 0, 0]? we find 


AOA) 
13.0606 
10.7191 
10.2073 
10.0633 
10.0198 
10.0063 
10.0020 
10.0007 
10.0002 


OMNAARWDND Se 


In practice, the usefulness of the power method depends upon the ratio 
|A2|/|Ai|, since it dictates the rate of convergence. The danger that q is 
deficient in x, is a less worrisome matter because rounding errors sustained 
during the iteration typically ensure that the subsequent q(*) have a com- 
ponent in this direction. Moreover, it is typically the case in applications 


332 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM 


where the dominant eigenvalue and eigenvector are desired that an a priori 
estimate of zı is known. Normally, by setting q to be this estimate, the 
dangers of a small a, are minimized. 

Note that the only thing required to implement the power method is a 
subroutine capable of computing matrix-vector products of the form Aq. 
It is not necessary to store A in an n-by-n array. For this reason, the 
algorithm can be of interest when A is large and sparse and when there is 
a sufficient gap between |A1| and |A;|. 

Estimates for the error |v) — ,| can be obtained by applying the 
perturbation theory developed in the previous section. Define the vector 
r(k) = Aq(9 — Aq and observe that (A + E(*))q() = Ag) where 
EG) = —r(9 [g(9]H.. Thus A? is an eigenvalue of A + E(9 and 

x) — da | x || EC? |l2 _ | r(9 |l; 
` s(à1) s(A1) 
If we use the power method to generate approximate right and left dominant 
eigenvectors, then it is possible to obtain an estimate of s(A1). In particular, 
if w(9 is a unit 2-norm vector in the direction of (A?)*w), then we can 


use the approximation s(A,) 2: | wth) q(9 |. 


7.3.2 Orthogonal Iteration 


A straightforward generalization of the power method can be used to com- 
pute higher-dimensional invariant subspaces. Let r be a chosen integer 
satisfying 1 < r € n. Given an n-by-r matrix Qo with orthonormal 
columns, the method of orthogonal iteration generates a sequence of matri- 
ces {Q} C €"** as follows: 


for k =1,2,... 
Zk = AQk-1 (7.3.4) 
Qk Rk = Zk (QR factorization) 

end 


Note that if r = 1, then this is just the power method. Moreover, the 
sequence (Qe, } is precisely the sequence of vectors produced by the power 
iteration with starting vector gU = Qoe;. 

In order to analyze the behavior of this iteration, suppose that 


QAQ = T = dag) +N |ul2pal 2: > An] — (7.3.5) 


is a Schur decomposition of A € C"*". Assume that 1 € r < n and parti- 
tion Q, T', and N as follows: 


— _ Tu Ti2 T 
Q = [Qa Qs | T= E T] n—r 
T n-r 
T n—r 
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— Ni M2 r 
N= | 0 Nog n-—r. 


T n-—r 

If [àr] > |Ar4i|, then the subspace D,(A) = ran(Qa) is said to be a dom- 
inant invariant subspace. It is the unique invariant subspace associated 
with the eigenvalues À1,..., Àr. The following theorem shows that with rea- 
sonable assumptions, the subspaces ran(Q&) generated by (7.3.4) converge 
to D,(A) at a rate proportional to |Ay41/Ar|*. 


Theorem 7.3.1 Let the Schur decomposition of A € C"*" be given by 
(7.3.5) and (7.3.6) with n > 2. Assume that |A,| > |Ar4i| and that 0 > 0 
satisfies 

(901 > IN le- 
If Qo € C"*" has orthonormal columns and 

d = dist(D.(AP),ran(Qo)) < 1, 
then the matrices Qy generated by (7.3.4) satisfy 
dist(D,(A),ran(Qk)) < 
(1 67? ( , Inel; ) (Bal +N le/0 z2) 
vV1- d? sep(71, 722) lAr| - MN Ie] + 4) 


Proof. The proof is given in an appendix at the end of this section. O 


The condition d « 1 in Theorem 7.3.1 ensures that the initial Q matrix is 
not deficient in certain eigendirections: 


d«1 o D,(AP)* nran(Qo) = (0). 
The theorem essentially says that if this condition holds and if 0 is chosen 
large enough, then 


k 
Ar+1 


Ar 
where c depends on sep(T11, T22) and A's departure from normality. Need- 
less to say, convergence can be very slow if the gap between |A.| and |A,41| 
is not sufficiently wide. 


dist(D,(A), ran(Qz)) < c 


Example 7.3.2 If (7.3.4) is applied to the matrix A in Example 7.3.1, with Qo = [e1, e2], 
we find: 


dist (D2(A), ran(Qx)) 
0052 
0047 
.0039 
.0030 
.0023 
0017 
0013 


NOME WN Lr 


334 CHAPTER 7. THE UNSYMMETRIC EIGENVALUE PROBLEM 


The error is tending to zero with rate (A3/A2)* = (3/4)*. 


It is possible to accelerate the convergence in orthogonal iteration using 
a technique described in Stewart (1976). In the accelerated scheme, the 


approximate eigenvalue A9 satisfies 


IA? — X] ovd Ann 


i=l:r. 


(Without the acceleration, the right-hand side is |Ai41/A;|*.) Stewart’s algo- 
rithm involves computing the Schur decomposition of the matrices Q1 AQ; 
every so often. The method can be very useful in situations where A is 
large and sparse and a few of its largest eigenvalues are required. 


7.3.3 The QR Iteration 


We now "derive" the QR iteration (7.3.1) and examine its convergence. 
Suppose r = n in (7.3.4) and the eigenvalues of A satisfy 


I| > [2l > +++ > [As]. 


Partition the matrix Q in (7.3.5) and Qx in (7.3.4) as follows: 


Q — [a....,d4] Qr = [dP.....«t? | 
If 
dist(D;( AP), span (q(99, ee a <1 iln (7.3.7) 
then it follows from Theorem 7.3.1 that 


dist(span(q?^, ... d?) span(q;, oqi} 7 0 


for i = l:n. This implies that the matrices Tj defined by 
Tr = QE AQk 


are converging to upper triangular form. Thus, it can be said that the 
method of orthogonal iteration computes a Schur decomposition provided 
the original iterate Qo € C"*" is not deficient in the sense of (7.3.7). 

The QR iteration arises naturally by considering how to compute the 
matrix Ty directly from its predecessor T,-1. On the one hand, we have 
from (7.3.4) and the definition of Ty. that 


T. 1 = QE AQz-1 = QË (AQ) = (QE. QU) Ry. 
On the other hand 
T. = QE AQ, = (QE AQ 1) (QE 1Q4) = Re(Q Qk). 
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Thus, Ty is determined by computing the QR factorization of Tk-ı and 
then multiplying the factors together in reverse order. This is precisely 
what is done in (7.3.1). 


Example 7.3.3 If the iteration: 


for k = 1,2,... 
A=QR 
A= RQ 
end 


is applied to the matrix of Example 7.3.1, then the strictly lower triangular elements 
diminish as follows: 
O(la21) 


O(laa)  O(lez2D 


S wo-oonrow-o 


Note that a single QR iteration is an O(n?) calculation. Moreover, since 
convergence is only linear (when it exists), it is clear that the method is a 
prohibitively expensive way to compute Schur decompositions. Fortunately 
these practical difficulties can be overcome, as we show in $7.4 and $7.5. 


7.3.4 LR Iterations 


We conclude with some remarks about power iterations that rely on the LU 
factorization rather than the QR factorizaton. Let Go € C"*" have rank r. 
Corresponding to (7.3.4) we have the following iteration: 


for k = 1,2,... 
Zk = AGk.1 (7.3.8) 
Zk = GER, (LU factorization) 

end 


Suppose r = n and that we define the matrices T, by 
Ty = GĮ AGk. (7.3.9) 


It can be shown that if we set Lo = Go, then the Tk can be generated as 
follows: 
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To = Lg! ALo 

for k =1,2,... (7.3.10) 
Tr-1 = L,Ry (LU factorization) 
Tk = Ry Ly 

end 


Iterations (7.3.8) and (7.3.10) are known as treppeniteration and the LR 
iteration, respectively. Under reasonable assumptions, the T& converge to 
upper triangular form. To successfully implement either method, it is nec- 
essary to pivot. See Wilkinson (1965, p.602). 


Appendix 


In order to establish Theorem 7.3.1 we need the following lemma which is 
concerned with bounding the powers of a matrix and its inverse. 


Lemma 7.3.2 Let QV AQ = T = D+N be a Schur decomposition of 
A € C"*" where D is diagonal and N strictly upper triangular. Let À and 
p. denote the largest and smallest eigenvalues of A in absolute value. If 
9 > 0 then for all k > 0 we have 


N k 
| AF lo € (1+0) (I 4 l- Je) . (7.3.11) 


If A is nonsingular and 0 > 0 satisfies (1 + 0)|n| > || N ||p, then for all 


k > 0 we also have 


1 k 
-k n-1 
eb < ato (Coa) 002m 


Proof. For 0 > 0, define the diagonal matrix A by 
A = diag (1,(1+6),(1+6)?,...,(1+6)"7) 


and note that K2(A) = (1+ 6)"~!. Since N is strictly upper triangular, it 
is easy to verify that || ANA~? ||, < || N || -/(1 + 6). Thus, 


|| A* lle |T* l2 = | ATD + ANA7'^A |l2 


K2(A) (II D lla + || ANA! |2)* 


- N |] V* 
1 gy 1 ll E . 
ater (pi Ee) 


I^ 


I^ 


On the other hand, if A is nonsingular and (1 + 0)|u| > || N || p, then 
|| AD7!NA^! || <1 and using Lemma 2.3.3 we obtain 


lA7*|a = IT-^*ll» = |A U + AD^"! NA7))71 D^! FAI], 
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(_ ID | * 
< A | 2 
S XA) u -| ADNA i;) 
< 


n—-1 1 k 
G +8) CEED p 


Proof of Theorem 7.3.1 
It is easy to show by induction that A*Qo = Qx(R&- -- Ri). By substi- 
tuting (7.3.5) and 7.3.6) into this equality we obtain 


e[l- [i] m 


where Vk = QHQ; and Wk = Q3 Q,. Using Lemma 7.1.5 we know that a 
matrix X € C"* (777) exists such that 


L X )'[% Tall X] [Tan 0 
0 Il. 0 Tz 0 Inr] 0 Tz 


and so 


Th 0 Vo-XWo] _ [ W- XW, 
E || Wo l-l w, Cen. 


Below we establish that the matrix Vy— X Wo is nonsingular and this enables 
us to obtain the following expression: 


-Lp- V 
Wi = Th Wo(Vo — XWo) "Ty [4s , -X] | W, |: 


Recalling the definition of distance between subspaces from §2.6.3, 


dist(D-(A),ran(Qx)) = | Q3 Qk lla = || We lla- 


Since 
li, —X] ll < 14+ |X Ip 


we have 


dist(D,(A), ran(Qx)) < (7.3.13) 
I| T lla || (Vo — XWo)7? lle I TR" lo + X Ilp) - 


To prove the theorem we must look at each of the four factors in the upper 
bound. 

Since sep(T11, T22) is the smallest singular value of the linear transfor- 
mation ó(X)— Tii.X — XT» it readily follows from ¢(X) = ~T}2 that 


I Tie Ile 


X < —————. 
IX lle < sep(T11, T22) 


(7.3.14) 
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Using Lemma 7.3.2, it can be shown that 


E Nilp\* 
| TH lla € (12-6)*777! (Con + inte) (7.3.15) 
and 
-k r-1 IN lie V" 
| Tu ll < (1 + 6) / RA — 148 . (7.3.16) 


Finally, we turn our attention to the || (Vo — XWo)7! || factor. Note 
that 


Vo -XWy = QiiQy - XQ] Qo 
Qu 
= a. xi z |e 


Q5 
L H 
= heson] 5, | Qo 
= (Ip +XX”) (Z Qo) 
where 
Z = [Q.Qa)| a ue xxt? 


= (Qa—QeX")(Ip + XX". 


The columns of this matrix are orthonormal. They are also a basis for 
D, (AP) because 


AP (Qa - QpX") = (Qa — Qa X”)TE. 


This last fact follows from the equation AH Q = QT. 
From Theorem 2.6.1 


d = dist(D, (AP), range(Qo)) = 4/1 - o7(Z#Qo)? 


and since d < 1 by hypothesis, 


e (ZP Qo) > 0. 
This shows that 
(Vo - XWo) = (Ip + XXPy "(ZH Qo) 
is nonsingular and thus, 
|| (Vo — XWo)-? [l2 I Zr + XX llall (ZH Qo)7* |l 


< 
< 1//1—@. (7.3.17) 
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The theorem follows by substituting (7.3.14)-(7.3.17) into (7.3.13). 0 


Problems 


P7.3.1 (a) Show that if X € C”*” is nonsingular, then || Allx = || X714X lle A 
a matrix norm with the property that || AB | < || A llxll B Ilx- (b) Let A € €^*" an 

set p = max |A;|. Show that for any € > 0 there exists a nonsingular X € (^*" such dt 
Ally = I X—1 AX |l2 € p- e. Conclude that there is a constant M such that || A* : l2 
< M(p +€)" for all non-negative integers k. (Hint: Set X = Q diag(1,a,...,a"~+) 
where Q? AQ = D + N is A's Schur decomposition.) 

P7.3.2 Verify that (7.3.10) calculates the matrices Ty defined by (7.3.9). 


P7.3.3 Suppose A € ("*" is nonsingular and that Qo € C"? has orthonormal columns 
The following iteration is referred to as inverse orthogonal iteration. 


for k = 1,2,.. 
Solve AZ, = Qx-—1 for Zk € C”? 


Zk = Qk Ry (QR factorization) 
end 


Explain why this iteration can usually be used to compute the p smallest eigenvalues 
of A in absolute value. Note that to implement this iteration it is necessary to be able 
to solve linear systems that involve A. When p = 1, the method is referred to as the 
inverse power method. 


P7.3.4 Assume A € R**" has eigenvalues A1,...,An that satisfy 
A=A1 = A2 =AzR=Aa> |As| 2: > Jnl] 


where À is positive. Assume that A has two Jordan blocks of the form. 


A 1 

0 AL 
Discuss the convergence properties of the power method when applied to this matrix. 
Discuss how the convergence might be accelerated. 


Notes and References for Sec. 7.3 


A detailed, practical discussion of the power method is given in Wilkinson (1965, chapter 
10). Methods are discussed for accelerating the basic iteration, for calculating nondomi- 
nant eigenvalues, and for handling complex conjugate eigenvalue pairs. The connections 
among the various power iterations are discussed in 


B.N. Parlett and W.G. Poole (1973). *A Geometric Theory for the QR, LU, and Power 
Iterations,” SIAM J. Num. Anal. 10, 389-412. 


The QR iteration was concurrently developed in 


J.G.F. Francis (1961). “The QR Transformation: A Unitary Analogue to the LR Trans- 
formation,” Comp. J. 4, 265-71, 332-34. 

V.N. Kublanovskaya (1961). “On Some Algorithms for the Solution of the Complete 
Eigenvalue Problem,” USSR Comp. Math. Phys. 3, 637-57. 


As can be deduced from the title of the first paper, the LR iteration predates the QR 
iteration. The former very fundamental algorithm was proposed by 
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H. Rutishauser (1958). “Solution of Eigenvalue Problems with the LR Transformation,” 
Nat. Bur. Stand. App. Math. Ser. 49, 47-81. 
B.N. Parlett (1995). “The New qd Algorithms,” ACTA Numerica 5, 459-491. 


Numerous papers on the convergence of the QR iteration have appeared. Several of these 
are 


J.H. Wilkinson (1965). “Convergence of the LR, QR, and Related Algorithms,” Comp. 
J. 8, 77-84. 

B.N. Parlett (1965). “Convergence of the Q-R Algorithm,” Numer. Math. 7, 187-93. 
(Correction in Numer. Math. 10, 163-64.) 

B.N. Parlett (1966). “Singular and Invariant Matrices Under the QR Algorithm,” Math. 
Comp. 20, 611-15. 

B.N. Parlett (1968). “Global Convergence of the Basic QR Algorithm on Hessenberg 
Matrices,” Math. Comp. 22, 803-17. 


Wilkinson (AEP, chapter 9) also discusses the convergence theory for this important 
algorithm. 

Deeper insight into the convergence of the QR algorithm and its connection to other 
important algorithms can be attained by reading 


D.S. Watkins (1982). “Understanding the QR Algorithm,” SIAM Review 24, 427—440. 

T. Nanda (1985). “Differential Equations and the QR Algorithm,” SIAM J. Numer. 
Anal. 22, 310-321. 

D.S. Watkins (1993). “Some Perspectives on the Eigenvalue Problem,” SIAM Review 
35, 430-471. 


The following papers are concerned with various practical and theoretical aspects of si- 
multaneous iteration: 


H. Rutishauser (1970). “Simultaneous Iteration Method for Symmetric Matrices,” Nu- 
mer. Math. 16, 205-23. See also (Wilkinson and Reinsch(1971, pp. 284-302. 

M. Clint and A. Jennings (1971). “A Simultaneous Iteration Method for the Unsym- 
metric Eigenvalue Problem,” J. Inst. Math. Applic. 8, 111-21. 

A. Jennings and D.R.L. Orr (1971). “Application of the Simultaneous Iteration Method 
to Undamped Vibration Problems,” Inst. J. Numer. Math. Eng. 3, 13-24. 

A. Jennings and W.J. Stewart (1975). “Simultaneous Iteration for the Partial Eigenso- 
lution of Real Matrices,” J. Inst. Math. Applic. 15, 351-62. 

G.W. Stewart (1975). “Methods of Simultaneous Iteration for Calculating Eigenvectors 
of Matrices,” in Topics in Numerical Analysis H , ed. John J.H. Miller, Academic 
Press, New York, pp. 185-96. 

G.W. Stewart (1976). “Simultaneous Iteration for Computing Invariant Subspaces of 
Non-Hermitian Matrices,” Numer. Math. 25, 123-36. 


See also chapter 10 of 


A. Jennings (1977). Matriz Computation for Engineers and Scientists, John Wiley and 
Sons, New York. 


Simultaneous iteration and the Lanczos algorithm (cf. Chapter 9) are the principal meth- 
ods for finding a few eigenvalues of a general sparse matrix. 
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7.4 The Hessenberg and Real Schur Forms 


In this and the next section we show how to make the QR iteration (7.3.1) 
a fast, effective method for computing Schur decompositions. Because the 
majority of eigenvalue/invariant subspace problems involve real data, we 
concentrate on developing the real analog of (7.3.1) which we write as fol- 
lows: 


Ho = Ud AUS 

for k = 1,2,... 
Hy, = UR, (QR factorization) (7.4.1) 
Hy = R&U, 

end 


Here, A € IR?*", each Up € IR?** is orthogonal, and each R; € IR?*" is 
upper triangular. A difficulty associated with this real iteration is that the 
Hy can never converge to strict, “eigenvalue revealing," triangular form 
in the event that A has complex eigenvalues. For this reason, we must 
lower our expectations and be content with the calculation of an alternative 
decomposition known as the real Schur decomposition. 

In order to compute the real Schur decomposition efficiently we must 
carefully choose the initial orthogonal similarity transformation Ug in (7.4.1) 
In particular, if we choose Ug so that Hp is upper Hessenberg, then the 
amount of work per iteration is reduced from O(n?) to O(n?). The initial 
reduction to Hessenberg form (the Ug computation) is a very important 
computation in its own right and can be realized by a sequence of House- 
holder matrix operations. 


7.4.1 The Real Schur Decomposition 


A block upper triangular matrix with either 1-by-1 or 2-by-2 diagonal blocks 
is upper quasi-triangular. The real Schur decomposition amounts to a real 
reduction to upper quasi-triangular form. 


Theorem 7.4.1 (Real Schur Decomposition) [fA € IR"*”, then there 
exists an orthogonal Q € R'*" such that 


Ra Ra Rim 
0 Roe - Ram 

QTAR =]. 2 (7.4.2) 
0 0 ` Rmm 


where each Rj; is either a 1-by-1 matriz or a 2-by-2 matriz having complez 
conjugate eigenvalues. 


Proof. The complex eigenvalues of A must come in conjugate pairs, since 
the characteristic polynomial det(zJ — A) has real coefficients. Let k be 
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the number of complex conjugate pairs in A(A). We prove the theorem by 
induction on k. Observe first that Lemma 7.1.2 and Theorem 7.1.3 have 
obvious real analogs. Thus, the theorem holds if k = 0. Now suppose that 
k>1.1IfA = y+ ip € A(À) and u Æ 0, then there exist vectors y and z in 
IR"(z # 0) such that A(y+iz) = (y+ ip)(y + iz), i.e., 


A z| = z YH | . 

(vz]=(y ej[ 78 

The assumption that u # 0 implies that y and z span a two-dimensional, 
real invariant subspace’ for A. It then follows from Lemma 7.1.2 that an 
orthogonal U € IR"*” exists such that 


T . | Tn The 2 
U AU B | 0 T22 n—2 


2 n—2 


where A(Ti1) = {A, à}. By induction, there exists an orthogonal Ü so 
UTT22U has the required structure. The theorem follows by setting Q = U 
diag(15, Ü). o 


The theorem shows that any real matrix is orthogonally similar to an upper 
quasi-triangular matrix. It is clear that the real and imaginary part of the 
complex eigenvalues can be easily obtained from the 2-by-2 diagonal blocks. 


7.4.2 A Hessenberg QR Step 


We now turn our attention to the speedy calculation of a single QR step 
in (7.4.1). In this regard, the most glaring shortcoming associated with 
(7.4.1) is that each step requires a full QR factorization costing O(n?) flops. 
Fortunately, the amount of work per iteration can be reduced by an order of 
magnitude if the orthogonal matrix Uy is judiciously chosen. In particular, 
if Ud AU = Ho = (hij) is upper Hessenberg (hi; = 0, i > j +1), then each 
subsequent H, requires only O(n?) flops to calculate. To see this we look at 
the computations H = QR and H4 = RQ when H is upper Hessenberg. As 
described in §5.2.4, we can upper triangularize H with a sequence of n — 1 
Givens rotations: QTH = GI ,.—GlH = R. Here, G; = G(i,i + 1, 6). 
For the n = 4 case there are three Givens premultiplications: 


X X X X X X X Xx X X X X 
X X X X 0 x x x X X X 
0x x x|  |0x x x|  |00 x x 
0 0 x x 0 0 x x 0 0 x x 

X X X X 

_,|9 x x x 

0 0 x x 

0 0 0 x 
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See Algorithm 5.2.3. 


The computation RQ = R(G---Gy_,) is equally easy to implement. 
In the n = 4 case there are three Givens post-multiplications: 


X X X Xx X X X Xx X X X X 
0 x x x X X X Xx X X X X 
00 x x|  .|00 x x| "|0 x x x 
0 0 0 x 0 0 0 x 0 0 0 x 

X X X X 

X X x x 

— 
0 x x x 
0 0 x x 


Overall we obtain the following algorithm: 


Algorithm 7.4.1 If H is an n-by-n upper Hessenberg matrix, then this 
algorithm overwrites H with H} = RQ where H = QR is the QR factor- 
ization of H. 


for k = ::n—1 
[c(k) , s(k) ] = givens(H(k, k), H(k - 1, k)) 


H(k:k + 1, k:n) = | A 4 | H(k:k +1, k:n) 


end 
for k=1:n-1 


end 


Let Gy = G(k,k + 1,0,) be the kth Givens rotation. It is easy to confirm 
that the matrix Q = G1--: G4; is upper Hessenberg. Thus, RQ = Hy is 
also upper Hessenberg. The algorithm requires about 6n? flops and thus is 
an order-of-magnitude quicker than a full matrix QR step (7.3.1). 


Example 7.4.1 If Algorithm 7.4.1 is applied to: 


3 1 2 
H = 4 2 3j, 
0 01 1 


0 1 0 0 
0 |, G2 = | 0 .9996 —.0249 |, 
1 


then 


on 0 


0 .0249 .9996 
and 


N 
+ 
M 


.3200 .1856 —2.1796 


4.7600 | —2.5442 5.4653 
.0000 .0263 1.0540 
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7.4.3 The Hessenberg Reduction 
It remains for us to show how the Hessenberg decomposition 
UAU =H UUV - I (7.4.3) 


can be computed. The transformation Uy can be computed as a product 
of Householder matrices Pj,..., P, 3. The role of Py is to zero the kth 
column below the subdiagonal. In the n — 6 case, we have 


X X X X X X X X X X X xX 
X X X X X X X X X X X x 
X X X X X X P, QO x X X X X Pa 
X X X X X X EM 0 x x xX Xx x 
X X X X X X 0 x x xX Xx x 
X X X X X X 0 x x x Xx x 
X X X X X x X X X X X X 
X X X X X X X X X X X X 
0 x X x x x Pa 0 x xX x x X Pa 
0 0 x x x x 0 0 x x x x 
0 0 x x x x 0 0 0 x x x 
0 0 x x x x 0 0 0 x x x 
X X X X X X 
X X X X X X 
QO x x x X x 
0 0 x x x x 
0 0 0 x x x 
0 0 0 0 x x 


In general, after k — 1 steps we have computed k — 1 Householder matrices 
B, ety Pa such that 


Bu Bi» Bia k-1 


B B B 
(P, Pa) A(P «++ Pa) _ 21 22 23 1 


is upper Hessenberg through its first k — 1 columns. Suppose Pj is an order 


n—k Householder matrix such that P, B32 is a multiple of e^, If = 
diag(J,, P), then 


T Bi Bie BigP, 
(B Pe)" A(Pi-++ Pk) = | Ba Bu BaP: 
0  PLBs2 P,B33P, 
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is upper Hessenberg through its first k columns. Repeating this for k = 
1:n — 2 we obtain 


Algorithm 7.4.2 (Householder Reduction to Hessenberg Form) 
Given A € IR?*^, the following algorithm overwrites A with H = Ug AU 
where H is upper Hessenberg and Up is product of Householder matrices. 


for k = 1:n - 2 
[v, 8] = house(A(k + 1:n, k)) 
A(k + 1:n, kin) = (I — GvvT) A(k + lin, k:n) 
A(1:n, k + Ln) = A(l:n, k  1:n)(1 — bv?) 
end 


This algorithm requires 10n?/3 flops. If Ug is explicitly formed, an addi- 
tional 4n?/3 flops are required. The kth Householder matrix can be repre- 
sented in A(k + 2:n, k). See Martin and Wilkinson (1968d) for a detailed 
description. 

The roundoff properties of this method for reducing A to Hessenberg 
form are very desirable. Wilkinson (1965, p.351) states that the computed 
Hessenberg matrix H satisfies H = QT(A + E)Q, where Q is orthogonal 
and | Elle < cn?u|| A || p with c a small constant. 


Example 7.4.2 If 


then 


7.4.4 Level-3 Aspects 


The Hessenberg reduction (Algorithm 7.4.2) is rich in level-2 operations: 
half gaxpys and half outer product updates. We briefly discuss two methods 
for introducing level-3 computations into the process. 

The first approach involves a block reduction to block Hessenberg form 
and is quite straightforward. Suppose (for clarity) that n = rN and write 


Az An A r 
^ | An Az j n-r 


r n-r 


Suppose that we have computed the QR factorization A2; = QR; and 
that Qı is in WY form. That is, we have Wi, Y, € IR(^7?*" such that 
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© 


ı = I — W, YF. (See §5.2.2 for details.) If Q1 = diag(J-,Q1) then 


T _ | Ar AnQi 
Q1 AQ = Ry QTA2Q1 | ' 
Notice that the updates of the (1,2) and (2,2) blocks are rich in level-3 
operations given that Q, is in WY form. This fully illustrates the overall 
process as QT AQ, is block upper Hessenberg through its first block column. 
We next repeat the computations on the first r columns of QT A22Q1. After 
N — 2 such steps we obtain 


Ay, Hr € Ut My 

Ha Ho --- 2e Hon 
H = UAL =| 0 ^" ^ : 

0 0 -:- Hyn wẹ- Hnn 


where each Hj; is r-by-r and Uo = Q1--- Qw-2 with with each Q; in WY 
form. The overall algorithm has a level-3 fraction of the form 1 - O(1/N). 

Note that the subdiagonal blocks in H are upper triangular and so the 
matrix has lower bandwidth p. It is possible to reduce H to actual Hessen- 
berg form by using Givens rotations to zero all but the first subdiagonal. 

Dongarra, Hammarling and Sorensen (1987) have shown how to proceed 
directly to Hessenberg form using a mixture of gaxpy’s and level-3 updates. 
Their idea involves minimal updating after each Householder transforma- 
tion is generated. For example, suppose the first Householder P, has been 
computed. To generate Pz we need just the second column of P, AP, not 
the full outer product update. To generate P we need just the 3rd col- 
umn of PaP AP, P, etc. In this way, the Householder matrices can be 
determined using only gaxpy operations. No outer product updates are 
involved. Once a suitable number of Householder matrices are known they 
can be aggregated and applied in a level-3 fashion. 


7.4.5 Important Hessenberg Matrix Properties 


The Hessenberg decomposition is not unique. If Z is any n-by-n orthogonal 
matrix and we apply Algorithm 7.4.2 to ZT AZ, then QT AQ = H is upper 
Hessenberg where Q = ZU. However, Qe, = Z(Uoe1) = Ze, suggesting 
that H is unique once the first column of Q is specified. This is essentially 
the case provided H has no zero subdiagonal entries. Hessenberg matrices 
with this property are said to be unreduced. Here is a very important 
theorem that clarifies the uniqueness of the Hessenberg reduction. 


Theorem 7.4.2 ( Implicit Q Theorem ) Suppose Q = [ q1, -.-, qn ] and 
V = [vi,..., v4 ] are orthogonal matrices with the property that both QT AQ 
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= H and V? AV =G are upper Hessenberg where A € IR?*^. Let k denote 
the smallest positive integer for which hy, , — 0, with the convention that 
k =n if H is unreduced. If qu = vi, then q; = +v; and |haici| = |giui-1l 
for i — 2:k. Moreover, if k <n, then gy 1, = 0. 


Proof. Define the orthogonal matrix W = [wi,...,Un] = VTQ and 
observe that GW = WH. By comparing column i — 1 in this equation for 
i — 2:k we see that 
i-1 
hiciw; = Gwi- — Y huiw; . 
j=1 


Since w; = ei, it follows that [ w1,..., wx ] is upper triangular and thus w; 
= +/,(:,i) = te; for i = 2:k. Since w; = VTq; and h;;:.1 = wI Gw; it 
follows that v; = +q; and 


Ihai-il = lay Aqi-1| = |v? Avia = lgi- 
for i = 2:k. If k < n, then 


T T T 
9k+1,k = €k 41G6k = €.4,GWe, = ep W Hex 


k k 
T T 
= Cpa ` hikWe; = 1 hiaej,,e; = 0.0 


i=1 t=] 


The gist of the implicit Q theorem is that if QT AQ = H and Z7AZ = G 
are each unreduced upper Hessenberg matrices and Q and Z have the same 
first column, then G and H are “essentially equal” in the sense that G = 
D^! HD where D = diag(+1,...,+1). 

Our next theorem involves a new type of matrix called a Krylov ma- 
trix. If A € IR^** and v € R”, then the Krylov matrix K(A,v,j) € IR? 
is defined by 

K(A,v,j) = [v, Av, +, Ally | . 
It turns out that there is a connection between the Hessenberg reduction 
QT AQ = H and the QR factorization of the Krylov matrix K(A, Q(:, 1), n). 


Theorem 7.4.3 Suppose Q € IR?** is an orthogonal matriz and A € R”*” 
Then QTAQ = H is an unreduced upper Hessenberg matriz if and only if 
QTK(A,Q(:,1),n) = R is nonsingular and upper triangular. 


Proof. Suppose Q € IR?** is orthogonal and set H = QT AQ. Consider 
the identity 
Q7 K (A,Q(, 1), n) = [e Hei, ase ,H™'e, ] = R . 


If H is an unreduced upper Hessenberg matrix, then it is clear that R is 
upper triangular with rj; = haiha2:-: hi-1 for i = 2:n. Since rjj = 1 it 
follows that R is nonsingular. 
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To prove the converse, suppose R is upper triangular and nonsingular. 
Since R(:,k +1) = HR(:,k) it follows that H(:,k) € span{ ei,...,ex41 }- 
This implies that H is upper Hessenberg. Since rnn = ha1ha2:-- ha,4 1 # 0 
it follows that H is also unreduced. O 


Thus, there is more or less a correspondence between nonsingular Krylov 
matrices and orthogonal similarity reductions to unreduced Hessenberg 
form. 

Our last result concerns eigenvalues of an unreduced upper Hessenberg 
matrix. 


Theorem 7.4.4 If A is an eigenvalue of an unreduced upper Hessenberg 
matriz H € R"*", then its geometric multiplicity is one. 


Proof. For any A € € we have rank(A — AJ) > n — 1 because the first 
n — 1 columns of H — XI are independent. O 


7.4.6 Companion Matrix Form 


Just as the Schur decomposition has a nonunitary analog in the Jordan 
decomposition, so does the Hessenberg decomposition have a nonunitary 
analog in the companion matriz decomposition. Let x € IR” and suppose 
that the Krylov matrix K = K(A,z,n) is nonsingular. If c = c(0:n — 1) 
solves the linear system Kc = —A"z, then it follows that AK = KC where 
C has the form: 


e 
e 


0 — Co 
1 0 : 0 —C) 

c= {91 -- 0 -a j, (7.4.4) 
0 0 .--- 1 —Cn-1 


The matrix C is said to be a companion matriz. Since 
det(zI — C) = co ez c 0 az +2” 


it follows that if K is nonsingular, then the decomposition K-! AK = C 
displays A's characteristic polynomial. This, coupled with the sparseness 
of C, has led to “companion matrix methods" in various application areas. 
These techniques typically involve: 


e Computing the Hessenberg decomposition U£ AU, = H. 
e Hoping H is unreduced and setting Y = (er, He, ..., H™ 1 e}. 
e Solving YC = HY for C. 
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Unfortunately, this calculation can be highly unstable. A is similar to an 
unreduced Hessenberg matrix only if each eigenvalue has unit geometric 
multiplicity. Matrices that have this property are called nonderogatory. It 
follows that the matrix Y above can be very poorly conditioned if A is close 
to a derogatory matrix. 

A full discussion of the dangers associated with companion matrix com- 
putation can be found in Wilkinson (1965, pp. 405 ff.). 


7.4.7 Hessenberg Reduction Via Gauss Transforms 


While we are on the subject of nonorthogonal reduction to Hessenberg 
form, we should mention that Gauss transformations can be used in lieu 
of Householder matrices in Algorithm 7.4.2. In particular, suppose permu- 
tations II;,...,IIj.1 and Gauss transformations M;,..., M, have been 
determined such that 


(My illi MiI)A(Mi ME 77 Milg) = B 


where 
By Bi; Bis k—1 
B = Ba B; Bs 1 
0 B32 B33 n—k 


k—1 1 n—k 


is upper Hessenberg through its first k — 1 columns. A permutation II, 
of order n — k is then determined such that the first element of I1, B35 is 
maximal in absolute value. This makes it possible to determine a stable 
Gauss transformation M, = I — zyel also of order n — k, such that all but 
the first component of M,(II, B32) is zero. Defining II, = diag(J;,T,) and 
My = diag(Iy, My), we see that 


(MIT, - -- MiI)A(MT, --- Mil)! = 
Bu Bi? Bs ÜF Mj! 
Bay B22 Ball] MX! 
O  Mill.Ba; Milli BssfIT Mj! 


is upper Hessenberg through its first k columns. Note that Mg 1 = I+ zef 
and so some very simple rank-one updates are involved in the reduction. 
A careful operation count reveals that the Gauss reduction to Hessen- 
berg form requires only half the number of flops of the Householder method. 
However, as in the case of Gaussian elimination with partial pivoting, there 
is a (fairly remote) chance of 2” growth. See Businger (1969). Another dif- 
ficulty associated with the Gauss approach is that the eigenvalue condition 
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numbers — the s(A)^! — are not preserved with nonorthogonal similarity 
transformations and this complicates the error estimation process. 


Problems 


P7.4.1 Suppose A € R”X” and z € R”. Give a detailed algorithm for computing an 
orthogonal Q such that QT AQ is upper Hessenberg and QT z is a multiple of e1. (Hint: 
Reduce z first and then apply Algorithm 7.4.2.) 

P7.4.2 Specify a complete reduction to Hessenberg form using Gauss transformations 
and verify that it only requires 5n3/3 flops. 

P7.4.3 In some situations, it is necessary to solve the linear system (A + zI)z = b for 
many different values of z € K and b € R”. Show how this problem can be efficiently 
and stably solved using the Hessenberg decomposition. 

P7.4.4 Give a detailed algorithm for explicitly computing the matrix Uo in Algorithm 
7.4.2. Design your algorithm so that H is overwritten by Uo. 

P7.4.5 Suppose H € R”X” is an unreduced upper Hessenberg matrix. Show that there 
exists a diagonal matrix D such that each subdiagonal element of D-1H D is equal to 
one. What is «2(D)? 

P7.4.6 Suppose W, Y € R?*" and define the matrices C and B by 


w -Y 
C = W 4 iY, s-[Y w | 


Show that if A € A(C) is real, then A € A(B). Relate the corresponding eigenvectors. 
w 


Y 
nonzero. Give an algorithm that stably determines c = cos(0) and s = sin(0) such that 


[0] [9 2] E 8] H [2 5] 


P7.4.8 Suppose (A, x) is a known eigenvalue-eigenvector pair for the upper Hessenberg 

matrix H € R*X". Give an algorithm for computing an orthogonal matrix P such that 
A wt 

0 Hi 


P7.4.7 Suppose A = | 2 | is a real matrix having eigenvalues A + ip, where p is 


where aff = —u?. 


PTHP = | 


where Hı € RO™—-)X("~1) is upper Hessenberg. Compute P as a product of Givens 
rotations. 
P7.4.9 Suppose H c R?*" has lower bandwidth p. Show how to compute Q € R**^, 


a product of Givens rotations, such that QT HQ is upper Hessenberg. How many flops 
are required? 


P7.4.10 Show that if C is a companion matrix with distinct eigenvalues 41,...,An,; 
then VCV-! = dieg(A1,..., An) where 
l A «ce A 
1 A3 ce a271 
V2|. . 
l àn ce Ano 


Notes and References for Sec. 7.4 


The real Schur decomposition was originally presented in 
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F.D. Murnaghan and A. Wintner (1931). “A Canonical Form for Real Matrices Under 
Orthogonal Transformations,” Proc. Nat. Acad. Sci. 17, 417-20. 


A thorough treatment of the reduction to Hessenberg form is given in Wilkinson (1965, 
chapter 6), and Algol procedures for both the Householder and Gauss methods appear in 


R.S. Martin and J.H. Wilkinson (1968). “Similarity Reduction of a General Matrix to 
Hessenberg Form,” Numer. Math. 12, 349-68. See also Wilkinson and Reinsch 
(1971,pp.339-58). 


Fortran versions of the Algol procedures in the last reference are in Eispack. 
Givens rotations can also be used to compute the Hessenberg decomposition. See 


W. Rath (1982). “Fast Givens Rotations for Orthogonal Similarity,” Numer. Math. 40, 
47-56. 


The high performance computation of the Hessenberg reduction is discussed in 


J.J. Dongarra, L. Kaufman, and S. Hammarling (1986). “Squeezing the Most Out of 
Eigenvalue Solvers on High Performance Computers,” Lin. Alg. and Its Applic. 77, 
113-136. 

J.J. Dongarra, S. Hammarling, and D.C. Sorensen (1989). “Block Reduction of Matrices 
to Condensed Forms for Eigenvalue Computations,” JACM 27, 215-227. 

M.W. Berry, J.J. Dongarra, and Y. Kim (1995). “A Parallel Algorithm for the Reduction 
of a Nonsymmetric Matrix to Block Upper Hessenberg Form,” Parallel Computing 
21, 1189-1211. 


The possibility of exponential growth in the Gauss transformation approach was first 
pointed out in 


P. Businger (1969). “Reducing a Matrix to Hessenberg Form,” Math. Comp. 23, 819-21. 


However, the algorithm should be regarded in the same light as Gaussian elimination 
with partial pivoting—stable for all practical purposes. See Eispack, pp. 56-58. 


Aspects of the Hessenberg decomposition for sparse matrices are discussed in 


LS. Duff and J.K. Reid (1975). “On the Reduction of Sparse Matrices to Condensed 
Forms by Similarity Transformations,” J. Inst. Math. Applic. 15, 217-24. 


Once an eigenvalue of an unreduced upper Hessenberg matrix is known, it is possible to 
zero the last subdiagonal entry using Givens similarity transformations. See 


P.A. Businger (1971). “Numerically Stable Deflation of Hessenberg and Symmetric Tridi- 
agonal Matrices,BIT 11, 262-70. 


Some interesting mathematical properties of the Hessenberg form may be found in 


B.N. Parlett (1967). “Canonical Decomposition of Hessenberg Matrices,” Math. Comp. 
21, 223-27. 

Y. Ikebe (1979). “On Inverses of Hessenberg Matrices,” Lin. Alg. and Its Applic. 24, 
93-97. 


Although the Hessenberg decomposition is largely appreciated as & "front end" decom- 
position for the QR iteration, it is increasingly popular as a cheap alternative to the 
more expensive Schur decomposition in certain problems. For a sampling of applications 
where it has proven to be very useful, consult 


W. Enright (1979). "On the Efficient and Reliable Numerical Solution of Large Linear 
Systems of O.D.E.'s," IEEE Trans. Auto. Cont. AC-24, 905-8. 
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G.H. Golub, S. Nash and C. Van Loan (1979). “A Hessenberg-Schur Method for the 
Problem AX + XB = C; IEEE Trans. Auto. Cont. AC-24, 909-13. 

A. Laub (1981). “Efficient Multivariable Frequency Response Computations,” IEEE 
Trans. Auto. Cont. AC-26, 407-8. 

C.C. Paige (1981). “Properties of Numerical Algorithms Related to Computing Control- 
lability,” IEEE Trans. Auto. Cont. AC-26, 130-38. 

G. Miminis and C.C. Paige (1982). “An Algorithm for Pole Assignment of Time Invariant 
Linear Systems,” International J. of Control 35, 341-354. 

C. Van Loan (1982). “Using the Hessenberg Decomposition in Control Theory,” in 
Algorithms and Theory in Filtering and Control , D.C. Sorensen and R.J. Wets 
(eds), Mathematical Programming Study No. 18, North Holland, Amsterdam, pp. 
102-11. 


The advisability of posing polynomial root problems as companion matrix eigenvalue 
problem is discussed in 


K.-C. Toh and L.N. Trefethen (1994). “Pseudozeros of Polynomials and Pseudospectra 
of Companion Matrices,” Numer. Math. 68, 403—425. 


A. Edelman and H. Murakami (1995). “Polynomial Roots from Companion Matrix 
Eigenvalues,” Math. Comp. 64, 763-776. 


7.5 The Practical QR Algorithm 


We return to the Hessenberg QR iteration which we write as follows: 


H = Ud AUo (Hessenberg Reduction) 


for k 2 1,2,... 
H-UR (QR factorization) (7.5.1) 
H = RU 

end 


Our aim in this section is to describe how the H’s converge to upper quasi- 
triangular form and to show how the convergence rate can be accelerated 
by incorporating shifts. 


7.5.1 Deflation 


Without loss of generality we may assume that each Hessenberg matrix H 
in (7.5.1) is unreduced. If not, then at some stage we have 


[| Hu Mie p 
H = | 0 2 | n-p 


D n-p 
where 1 < p < n and the problem decouples into two smaller problems 
involving Hi, and H22. The term deflation is also used in this context, 
usually when p = n — 1 or n — 2. 


In practice, decoupling occurs whenever a subdiagonal entry in H is 
suitably small. For example, in Eispack if 


[SA < cu(|hppl + |hp+1,p+1l) (7.5.2) 


7.5. "THE PRACTICAL QR ALGORITHM 353 


for a small constant c, then hp+1,p is “declared” to be zero. This is justified 
since rounding errors of order u|| H || are already present throughout the 
matrix. 


7.5.2 The Shifted QR Iteration 


Let u € R and consider the iteration: 
H = Ud AUo (Hessenberg Reduction) 


for k 2 1,2,... 
Determine a scalar p. 
H-—yuI UR (QR factorization) (7.5.3) 
H = RU + pI 

end 


The scalar p is referred to as a shift. Each matrix H generated in (7.5.3) 
is similar to A, since RU + pl = UT(UR+pI)U = UT HU. If we order 
the eigenvalues A; of A so that 


EM — u| 22 làn — ul. 
and p is fixed from iteration to iteration, then the theory of $7.3 says that 
the pth subdiagonal entry in H converges to zero with rate 
k 
Àpki — H | 
Ap — H 


Of course, if Ap = Ap4i, then there is no convergence at all. But if, for 
example, yz is much closer to A4 than to the other eigenvalues, then the 
zeroing of the (n, n — 1) entry is rapid. In the extreme case we have the 
following: 


Theorem 7.5.1 Let u be an eigenvalue of an n-by-n unreduced Hessenberg 
matric H. If H = RU + pl, where H — pI = UR is the QR factorization 
of H — ul, then hy y-1 = 0 and hnn = p. 


Proof. Since H is an unreduced Hessenberg matrix the first n — 1 columns 
of H — yl are independent, regardless of jj. Thus, if UR = (H — pI) is the 
QR factorization then rj; Æ 0 for i = l:n — 1. But if H — pl is singular then 
T11: Tan — 0 . Thus, ran = 0 and H(n,:) = (0, ...,0, n]. B 


The theorem says that if we shift by an exact eigenvalue, then in exact 
arithmetic deflation occurs in one step. 


Example 7.5.1 If 
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then 6 € A(H). If UR = H — 6I is the QR factorization, then H = RU + 6I is given by 


E 8.5384  —3.7313  —1.0090 
H = 0.6343 5.4615 1.3867 
0.0000 0.0000 6.0000 


7.5.3 The Single Shift Strategy 


Now let us consider varying 4 from iteration to iteration incorporating new 
information about A(A) as the subdiagonal entries converge to zero. A 
good heuristic is to regard Ann as the best approximate eigenvalue along 
the diagonal. If we shift by this quantity during each iteration, we obtain 
the single-shift QR iteration: 


fork = 1,2,... 
H= H(n, n) 
H-pI=UR (QR Factorization) (7.5.4) 
H = RU + ul 

end 


If the (n,n — 1) entry converges to zero, it is likely to do so at a quadratic 
rate. To see this, we borrow an example from Stewart (1973, p. 366). 
Suppose H is an unreduced upper Hessenberg matrix of the form 


y 

II 
oooxx 
ooxX X X 
ox xX xX xX 
mx XX X 
x xX X X 


han 


and that we perform one step of the single-shift QR algorithm: UR = 
H—h4j4,1, H = RU + hpnl. After n —2 steps in the reduction of H — hnnT 
to upper triangular form we obtain a matrix with the following structure: 


X X X X x 
0 x x Xx x 
H= 0 0 x x x 
0 0 0 a b 
0 0 0 e 0 


It is not hard to show that the (n,n — 1) entry in H = RU + hpnl is 
given by —e?b/(e? + a). If we assume that € < a, then it is clear that 
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the new (n, n — 1) entry has order e?, precisely what we would expect of a 
quadratically converging algorithm. 


1 2 3 
H-—-|4 5 6 
0 .001 7 


and UR = H — TI is the QR factorization, then H = RU + TI is given by 


] —0.5384 1.6908 0.8351 
H x 0.3076 6.5264 -6.6555 |. 
0.0000 2.10-5 7.0119 


Example 7.5.2 If 


Near-perfect shifts as above almost always ensure a small Ran-1- However, this is just 
a heuristic. There are examples in which A, n—1 is a relatively large matrix entry even 
though omin(H — ul) © u. 


7.5.4 The Double Shift Strategy 


Unfortunately, difficulties with (7.5.4) can be expected if at some stage the 
eigenvalues a, and a5 of 


— hmm hmn — 
G = | ho, Ben | m=n-1 (7.5.5) 


are complex for then hnn would tend to be a poor approximate eigenvalue. 
A way around this difficulty is to perform two single-shift QR steps in 
succession using a, and az as shifts: 


H ~al = UR 

Hj = RU, +a (7.5.6) 
A, ~al = UzR 

Hy = RU +al 


These equations can be manipulated to show that 
(U,U2)(R2Ri) = M (7.5.7) 
where M is defined by 
M = (H - aiD)((H — azl). (7.5.8) 
Note that M is a real matrix even if G's eigenvalues are complex since 
M = H? —sH etl 


where 
8 =a, + 42 = hmm + hnn = trace(G) ER 
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and 
t = aiaz = hmmħinn — hmnħnm = det(G) ER. 


Thus, (7.5.7) is the QR factorization of a real matrix and we may choose 
U, and U2 so that Z = U,V? is real orthogonal. It then follows that 


H = UF HU; = UE (UP HU,)U = (U1U3)! H(UiU2) = 27 HZ. 


is real. 
Unfortunately, roundoff error almost always prevents an exact return to 
the real field. A real Hz could be guaranteed if we 


e explicitly form the real matrix M = H? — sH + tI, 
e compute the real QR factorization M — ZR, and 
e set H5 = ZT HZ. 


But since the first of these steps requires O(n?) flops, this is not a practical 
course of action. 


7.5.5 The Double Implicit Shift Strategy 


Fortunately, it turns out that we can implement the double shift step with 
O(n?) flops by appealing to the Implicit Q Theorem of 87.4.5. In particular 
we can effect the transition from H to H2 in O(n?) flops if we 


e compute Me,, the first column of M; 


e determine a Householder matrix Po such that P(Me;) is a multiple 
of ei; 


e compute Householder matrices P),...,P,-2 such that if Z, is the 
product Z) = RP -++ Pa-2, then ZT HZ, is upper Hessenberg and 
the first columns of Z and Z are the same. 


Under these circumstances, the Implicit Q theorem permits us to conclude 
that if Z7 HZ and ZI HZ, are both unreduced upper Hessenberg matrices, 
then they are essentially equal. Note that if these Hessenberg matrices are 
not unreduced, then we can effect a decoupling and proceed with smaller 
unreduced subproblems. 

Let us work out the details. Observe first that Py can be determined in 
O(1) flops since Mei = [z, y, z, 0,..., 0]? where 


z = hh + higha — shy +t 
y = ha (hii + h22 — 5) 
z = haha. 
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Since a similarity transformation with P) only changes rows and columns 
1, 2, and 3, we see that 


X X X X X X 
X X X X X X 
PHP) = X X X X X X 
X X X X X X 
0 0 0 x x x 
0 0 0 0 x x 
Now the mission of the Householder matrices P,,..., Pa-2 is to restore this 


matrix to upper Hessenberg form. The calculation proceeds as follows: 


X X X X X X X X X X X X 
X X X X X X X X X X X X 
X X X X X X P QO x x X x X Pa 
X X X X X X EM 0 x x Xx x x EM 
0 0 0 x x x 0 x x x x x 
0 0 0 0 x x 0 0 0 0 x x 
X X X X X xX X X X X X X 
X X X X X X X X X X X X 
QO x x x x x P3 0 x x X X X P4 
0 0 x x x x EM 0 0 x x x x EM 
0 0 x x x x 0 0 0 x x x 
0 0 x x x x 0 0 0 x x x 
X X X xX X Xx 
X X X X X X 
0 x x x x x 
0 0 x x x x 
0 0 X X x 
0 0 0 0 x x 


Clearly, the general P, has the form Pj = diag(I,, Pk, In-k- 3) where B, is 
a 3-by-3 Householder matrix. For example, 


P = 


coocoreo 
OX Xx oo 
OX Xxoo 


oooococcr 
OX xXx oo 
rococo 


Note that P,-2 is an exception to this since P,_o= diag(I,-2, P,-2). 
The applicability of Theorem 7.4.3 (the Implicit Q theorem) follows 
from the observation that P,e; = ej for k = 1:n — 2 and that P) and Z 
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have the same first column. Hence, Z1e1 = Ze,, and we can assert that Zi 
essentially equals Z provided that the upper Hessenberg matrices ZT HZ 
and Zi HZ, are each unreduced. 

The implicit determination of H> from H outlined above was first de- 
scribed by Francis (1961) and we refer to it as a Francis QR step. The 
complete Francis step is summarized as follows: 


Algorithm 7.5.1 (Francis QR Step) Given the unreduced upper Hes- 
senberg matrix H € R”*” whose trailing 2-by-2 principal submatrix has 
eigenvalues a, and a», this algorithm overwrites H with ZT HZ, where Z = 
P, --- Pa—2 is a product of Householder matrices and ZT (H —a4I)(H —az1) 
is upper triangular. 


m=n-1 
(Compute first column of (H — a1I)(H — a2I).} 
s = H(m,m)+ H(n,n) 
t = H(m,m)H (n,n) — H(m,n)H(n, m) 
z = H(1,1)H(1,1) + H(1,2)H (2,1) — sH(1,1) +t 
y = H(2,1)(H(1,1) + H(2,2) — s) 
z = H(2,1)H(3,2) 
for k = 0:n — 3 
[v, 8] = house([z y z]7) 
q = max{1, k}. 
H(k + Lk +3, qin) = (I — BuvT)H(k + 1:k + 3, q:n) 
r = min(k +4,n} 
H(l:r, k + 1:k +3) = H(1:r,k + Lk +3)(I — BvvT) 
x= H(k+2,k+1) 
y = H(k+3,k +1) 
if k «n—3 
z=H(k+4,k+1) 
end 
end 
lv, 8) = house([z y]7) 
H(n — 1:n, n — 2:n) = (I — 8vvT HH (n — 1:n, n — 2:n) 
H(1:n,n — 1:n) = H(1in,n — 1:n)(I — Buv?) 


This algorithm requires 10n? flops. If Z is accumulated into a given or- 
thogonal matrix, an additional 10n? flops are necessary. 


7.5.6 The Overall Process 


Reducing A to Hessenberg form using Algorithm 7.4.2 and then iterating 
with Algorithm 7.5.1 to produce the real Schur form is the standard means 
by which the dense unsymmetric eigenproblem is solved. During the iter- 
ation it is necessary to monitor the subdiagonal elements in H in order to 
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spot any possible decoupling. How this is done is illustrated in the following 
algorithm: 


Algorithm 7.5.2 (QR Algorithm) Given A € IR""* and a tolerance 
tol greater than the unit roundoff, this algorithm computes the real Schur 
canonical form Q7 AQ — T. A is overwritten with the Hessenberg decompo- 
sition. If Q and T are desired, then T is stored in H. If only the eigenvalues 
are desired, then diagonal blocks in T' are stored in the corresponding po- 
sitions in H. 


Use Algorithm 7.4.2 to compute the Hessenberg reduction 
H- Ud AUo where Ug- Pi ttt P4. 
If Q is desired form Q = P, --- P,..5. See$5.1.6. 
until q — n 
Set to zero all subdiagonal elements that satisfy: 
hai] < tol(|has| + hii). 
Find the largest non-negative q and the smallest 
non-negative p such that 


Hu Hi His p 
H = 0 H22 Hos n—p-q 
0 0 H33 q 


P n—p—4q q 


where H33 is upper quasi-triangular and H32 is 
unreduced. (Note: either p or q may be zero.) 
ifq«n 
Perform a Francis QR step on Hz: Hoo = ZT HZ 
if Q is desired 
Q- Qdiag(I», Z, l) 
Hiz = Hi2Z 
Has = ZT Has 
end 
end 
end 
Upper triangularize all 2-by-2 diagonal blocks in H that have 
real eigenvalues and accumulate the transformations 
if necessary. 


This algorithm requires 25n? flops if Q and T are computed. If only the 
eigenvalues are desired, then 10n? flops are necessary. These flops counts 
are very approximate and are based on the empirical observation that on 
average only two Francis iterations are required before the lower 1-by-1 or 
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2-by-2 decouples. 


Example 7.5.3 If Algorithm 7.5.2 is applied to 


2345 6 
4456 7 
A=103 67 8 |, 
0028 9 
0 0 0 1 10 


then the subdiagonal entries converge as follows 


Iteration — O(|hzil) O(|A32|) — O(lhasl) — O(lhsal) 


1 10° 10° 10° 10° 

2 10° 10° 10° 10° 

3 10° 10° 107! 109 

4 109 109 1073 1073 
5 10° 10° 10-6 1075 
6 107! 109 10713 10713 
7 107! 109 10-28 10713 
8 10-4 10° converg.  converg. 
9 10-8 109 

10 10-8 109 

11 10-16 109 

12 10-32 109 

13 converg.  converg. 


The roundoff properties of the QR algorithm are what one would expect 
of any orthogonal matrix technique. The computed real Schur form T' is 
orthogonally similar to a matrix near to A, i.e., 


QT(A+E)Q = Ê 


where QTQ = I and || E ||; =~ ull A ||2. The computed Q is almost orthog- 
onal in the sense that QTQ = I + F where || F ||; = u. 

The order of the eigenvalues along Î is somewhat arbitrary. But as we 
discuss in $7.6, any ordering can be achieved by using a simple procedure 
for swapping two adjacent diagonal entries. 


7.5.7 | Balancing 


Finally, we mention that if the elements of A have widely varying magni- 
tudes, then A should be balanced before applying the QR algorithm. This 


is an O(n?) calculation in which a diagonal matrix D is computed so that 
if 


DAD = [e1,...,en]) = | : 
T 


Tn 


then || ri |loo & || ci loo for i = 1:n. The diagonal matrix D is chosen to have 
the form D = diag(6*,..., 85^) where 8 is the floating point base. Note 
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that D-!AD can be calculated without roundoff. When A is balanced, the 
computed eigenvalues are often more accurate. See Parlett and Reinsch 
(1969). 


Problems 


P7.5.1 Show that if H = QT HQ is obtained by performing a single-shift QR step with 
H- | " z | , then [h2i| € |g?z|/[(w — z)? + y?]. 


P7.5.2 Give a formula for the 2-by-2 diagonal matrix D that minimizes || D^! AD || p 


where À — | vor |: 

y 2 
P7.5.3 Explain how the single-shift QR step H — pI = UR, H = RU + pI can be 
carried out implicitly. That is, show how the transition from H to H can be carried out 
without subtracting the shift u from the diagonal of H. 


P7.5.4 Suppose H is upper Hessenberg and that we compute the factorization PH = 
LU via Gaussian elimination with partial pivoting. (See Algorithm 4.3.4.) Show that 
H, = U(PT L) is upper Hessenberg and similar to H. (This is the basis of the modified 
LR algorithm.) 

P7.5.5 Show that if H = Ho is given and we generate the matrices Hy via Hy — kI 
= Uk Ry, Hy44 = RUE + wel, then 


(Ui UR ++ Ra) = (H -gaD) (H = ag). 


Notes and References for Sec. 7.5 


The development of the practical QR algorithm began with the important paper 


H. Rutishauser (1958). “Solution of Eigenvalue Problems with the LR Transformation,” 
Nat. Bur. Stand. App. Math. Ser. 49, 47-81. 


The algorithm described here was then "orthogonalized" in 


J.G.F. Francis (1961). "The QR Transformation: A Unitary Analogue to the LR Trans- 
formation, Parts I and II’ Comp. J. 4, 265-72, 332-45. 


Descriptions of the practical QR algorithm may be found in Wilkinson (1965) and Stew- 
art (1973), and Watkins (1991). See also 


D. Watkins and L. Elsner (1991). “Chasing Algorithms for the Eigenvalue Problem,” 
SIAM J. Matriz Anal. Appl. 12, 374-384. 

D.S. Watkins and L. Elsner (1991). “Convergence of Algorithms of Decomposition Type 
for the Eigenvalue Problem,” Lin.Alg. and Its Application 143, 19-47. 

J. Erxiong (1992). “A Note on the Double-Shift QL Algorithm,” Lin.Alg. and Its 
Application 171, 121-132. 


Algol procedures for LR and QR methods are given in 


R.S. Martin and J.H. Wilkinson (1968). “The Modified LR Algorithm for Complex Hes- 
senberg Matrices,” Numer. Math. 12, 369-76. See also Wilkinson and Reinsch(1971, 
pp. 396-403). 
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R.S. Martin, G. Peters, and J.H. Wilkinson (1970). “The QR Algorithm for Real Hes- 
senberg Matrices,” Numer. Math. 14, 219-31. See also Wilkinson and Reinsch(1971, 
pp. 359-71). 


Aspects of the balancing problem are discussed in 


E.E. Osborne (1960). “On Preconditioning of Matrices,” JACM 7, 338-45. 

B.N. Parlett and C. Reinsch (1969). “Balancing a Matrix for Calculation of Eigen- 
values and Eigenvectors,” Numer. Math. 13, 292-304. See also Wilkinson and 
Reinsch(1971, pp. 315-26). 


High performance eigenvalue solver papers include 


Z. Bai and J.W. Demmel (1989). “On a Block Implementation of Hessenberg Multishift 
QR Iteration,” Int'l J. of High Speed Comput. 1, 97-112. 

G. Shroff (1991). “A Parallel Algorithm for the Eigenvalues and Eigenvectors of a 
General Complex Matrix,” Numer. Math. 58, 779-806. 

R.A, Van De Geijn (1993). “Deferred Shifting Schemes for Parallel QR Methods,” SIAM 
J. Matriz Anal. Appl. 14, 180-194. 

A.A. Dubrulle and G.H. Golub (1994). “A Multishift QR Iteration Without Computa- 
tion of the Shifts,” Numerical Algorithms 7, 173-181. 


7.6 Invariant Subspace Computations 


Several important invariant subspace problems can be solved once the real 


Schur decomposition QT AQ = T has been computed. In this section we 
discuss how to 


e compute the eigenvectors associated with some subset of A(A), 


e compute an orthonormal basis for a given invariant subspace, 


block-diagonalize A using well-conditioned similarity transformations, 
e compute a basis of eigenvectors regardless of their condition, and 
e compute an approximate Jordan canonical form of A. 
Eigenvector/invariant subspace computation for sparse matrices is discussed 


elsewhere. See §7.3 as well as portions of Chapters 8 and 9. 


7.6.1 Selected Eigenvectors via Inverse Iteration 


Let q% € C" bea given unit 2-norm vector and assume that A — pI € IR™*” 
is nonsingular. The following is referred to as inverse iteration: 


for k = 1,2,... 
Solve (A — pI)2(02 = q*-n 
q® = 209 | 2 |2 (7.6.1) 


AG = qT Ag® 
end 
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Inverse iteration is just the power method applied to (A — pJ)~?. 
To analyze the behavior of (7.6.1), assume that A has a basis of eigen- 
vectors (z1,...,z4,) and that Az; = \;z; for i = lin. If 


n 
g = Y Bix 
i=1 


then q(9 is a unit vector in the direction of 


(A-uD-*9 = Y; 


TY. AK koi . 


Clearly, if , is much closer to an eigenvalue A; than to the other eigenvalues, 
then q(9 is rich in the direction of z; provided 8; Æ 0. 

A sample stopping criterion for (7.6.1) might be to quit as soon as the 
residual 


r® = (A- uI)4? 
satisfies 


lr® llo < cull A llo (7.6.2) 


where c is a constant of order unity. Since 
(A+ ErP = pq 


with E, = pl) glk)? it follows that (7.6.2) forces and q to be an 
exact eigenpair for a nearby matrix. 

Inverse iteration can be used in conjunction with the QR algorithm as 
follows: 


e Compute the Hessenberg decomposition Uf AUp = H. 


e Apply the double implicit shift Francis iteration to H without accu- 
mulating transformations. 


e For each computed eigenvalue A whose corresponding eigenvector z 
is sought, apply (7.6.1) with A = H and p = A to produce a vector z 
such that Hz & pz. 


e Set z = Ugz. 


Inverse iteration with H is very economical because (1) we do not have to 
accumulate transformations during the double Francis iteration; (2) we can 
factor matrices of the form H — AI in O(n?) flops, and (3) only one iteration 
is typically required to produce an adequate approximate eigenvector. 
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This last point is perhaps the most interesting aspect of inverse iteration 
and requires some justification since À can be comparatively inaccurate if 
it is ill-conditioned. Assume for simplicity that A is real and let 


n 
H-A = Slowey = UIV" 
i=l 


be the SVD of H — AI. From what we said about the roundoff properties 
of the QR algorithm in §7.5.6, there exists a matrix E € IR**" such that 
H +E — Al is singular and || E ||; = u|| H ||. It follows that on + uc; and 
I| (H - MI Wn ||2 = ua, i.e., Un is a good approximate eigenvector. Clearly 
if the starting vector q has the expansion 


qO = So yu 


then 


OM 2 y^ 


is “rich” in the direction v4. Note that if s(A) ~ |uLZvs| is small, then 
z0) is rather deficient in the direction un. This explains (heuristically) 
why another step of inverse iteration is not likely to produce an improved 
eigenvector approximate, especially if A is ill-conditioned. For more details, 
see Peters and Wilkinson (1979). 


Example 7.6.1 The matrix 


1 1 
A = | otv 1] 


has eigenvalues A; = .99999 and A2 = 1.00001 and corresponding eigenvectors r, = 
[1, —10-75|T and z2 = [1, 10-5]7. The condition of both eigenvalues is of order 105. 
The approximate eigenvalue p = 1 is an exact eigenvalue of A + E where 


0 0 
E= | Lugo o]: 


Thus, the quality of p is typical of the quality of an eigenvalue produced by the QR 
algorithm when executed in 10-digit floating point. 

If (7.6.1) is applied with starting vector g = (0, 1]7, then g@)= [1,0]T and 
|| Ag(? — pq ||; = 10719. However, one more step produces q(? = [0, 1]? for which 
|| Ag — uq® ||; = 1. This example is discussed in Peters and Wilkinson (1979). 
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7.6.2 Ordering Eigenvalues in the Real Schur Form 


Recall that the real Schur decomposition provides information about in- 
variant subspaces. If 


Tu T p 
T A =T = 11 12 | 
Q 4Q | 0 T2] q 


P q 


and A(T31) N A(T22) = 8, then the first p columns of Q span the unique 
invariant subspace associated with \(Ti1). (See §7.1.4.) Unfortunately, the 
Francis iteration supplies us with a real Schur decomposition QT. AQ p—Tp 
in which the eigenvalues appear somewhat randomly along the diagonal of 
Tp. This poses a problem if we want an orthonormal basis for an invariant 
subspace whose associated eigenvalues are not at the top of T's diago- 
nal. Clearly, we need a method for computing an orthogonal matrix Qp 
such that QETrQp is upper quasi-triangular with appropriate eigenvalue 
ordering. 

A look at the 2-by-2 case suggests how this can be accomplished. Sup- 
pose 


QLAQr = Tp = | ^ x | Ai # Ag 


and that we wish to reverse the order of the eigenvalues. Note that Tpz = 


Azz where 
u tie 
z= | uA |: 


Let Qp be a Givens rotation such that the second component of QF z is 
zero. If Q = QrQp then 


(QTAQ)e = QbTE(Qpei) = XQb(Qpei) = 226 


and so Q7 AQ must have the form 


QT AQ = E EE 


By systematically interchanging adjacent pairs of eigenvalues using this 
technique, we can move any subset of A(À) to the top of T's diagonal as- 
suming that no 2-by-2 bumps are encountered along the way. 


Algorithm 7.6.1 Given an orthogonal matrix Q € IR**^, an upper tri- 
angular matrix T = Q7 AQ, and a subset A = {)j,...,Ap} of A(A), the 
following algorithm computes an orthogonal matrix Qp such that QD TQp 
= S is upper triangular and (s;1,..., Spp} = A. The matrices Q and T are 
overwritten by QQ p and S respectively. 
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while (tii, easy top} X A^ 
for k = 1:n—1 
if tkk Z A and tk+1,k+1 CA 


[6 s] = givens(T(k, k - 1), T(k - 1, k - 1) - T(k,k 


T 
T(k:k + 1, k:n) = | 4 ‘| T(k:k + 1, k:n) 
T(1:k +1,k:k + 1) = T(1:k +1, kk +1) | B ° | 


Q(1:n, k:k +1) = Q(1:n, k:k + 1) | B ° | 


end 
end 
end 


This algorithm requires k(12n) flops, where k is the total number of required 
swaps. The integer k is never greater than (n — p)p. 

The swapping gets a little more complicated when T' has 2-by-2 blocks 
along its diagonal. See Ruhe (1970) and Stewart (1976) for details. Of 
course, these interchanging techniques can be used to sort the eigenvalues, 
say from maximum to minimum modulus. 

Computing invariant subspaces by manipulating the real Schur decom- 
position is extremely stable. If Q =[d1,---,4n ] denotes the computed or- 
thogonal matrix Q, then || QTQ — I || = u and there exists a matrix E 
satisfying || E ||; = ull A ||a such that (A+ E)j; € span(ái....,d,) for 
i = lp. 


7.6.3 Block Diagonalization 


Let 
Tu The ++: Tig nı 
0 T» e Taq ne 
"7|: o:ce o: (7.6.3) 
0 0 = Ta | n 
ni mna fq 


be a partitioning of some real Schur canonical form QT AQ = T e IR**" 
such that A(T11),..., A(T34) are disjoint. By Theorem 7.1.6 there exists a 
matrix Y such that Y !TY = diag(T,..., T4). A practical procedure 
for determining Y is now given together with an analysis of Y’s sensitivity 
as a function of the above partitioning. 

Partition I, = | E1,..., Eq | conformably with T and define the matrix 
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Y; € IR"*” as follows: 
Yu = dc EjZEIT. i<j, Zij € RX" 


In other words, Y;; looks just like the identity except that Zij occupies the 
(i,j) block position. It follows that if Y; TY;; = T = (Tij) then T and T 
are identical except that 


Tij = Tie Zig — ZiT;; + Ti 
Ti = Tik — Zi Tjk (k = j + 1:4) 
Tkj = TkiZij + Tkj (k =1la—- 1) 


Thus, T;; can be zeroed provided we have an algorithm for solving the 
Sylvester equation 
FZ—-ZG-2C (7.6.4) 


where F € IRP*? and G € R'*" are given upper quasi-triangular matrices 
and C c IRP**, 

Bartels and Stewart (1972) have devised a method for doing this. Let C 
=[c1,...,¢,] and Z = [z1,..., Zp | be column partitionings. If gk+1,k = 0, 
then by comparing columns in (7.6.4) we find 


k 
Fzk — Y Jirži = Ck. 
i=1 
Thus, once we know 2j, ..., 2-1 then we can solve the quasi-triangular 


system 
k-1 
(F — gekl) zk = ck +) gizi 
i=1 


for zy. If g9k+1,k Æ 0, then z, and zķ+ı can be simultaneously found by 
solving the 2p-by-2p system 


F — gy.I Ümkl z c = Jik? 
— kk —Jmk k _ k ik 
— kml F — Jmm | | Zm | E | Cm | * » | Jimi | (7.6.5) 


i=1 


where m = k+1. By reordering the equations according to the permutation 
(1,p+1,2,p+2,...,p,2p), a banded system is obtained that can be solved 
in O(p?) flops. The details may be found in Bartels and Stewart (1972). 
Here is the overall process for the case when F and G are each triangular. 


Algorithm 7.6.2 (Bartels-Stewart Algorithm) Given C € IR?*” and 
upper triangular matrices F € IRP*? and G € IR" that satisfy A(F) N 
A(G) = 9, the following algorithm overwrites C with the solution to the 
equation FZ — ZG = C. 
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fork = Lr 
C(1:p,k) = C(l:p, k) + C(1:p, 1:k — 1)G(1:k — 1, k) 
Solve (F — G(k, k)I)z = C(1:p, k) for z. 
C(l:p,k) =z 

end 


This algorithm requires pr(p + r) flops. 
By zeroing the super diagonal blocks in T in the appropriate order, the 
entire matrix can be reduced to block diagonal form. 


Algorithm 7.6.3 Given an orthogonal matrix Q € IR”*", an upper quasi- 
triangular matrix T = QT AQ, and the partitioning (7.6.3), the following 
algorithm overwrites Q with QY where Y -! TY = diag(Ti1,...,Tyq)- 


for j = 2:q 
for i= 1:j—1 
Solve TaZ — ZT;; = —Tij for Z using Algorithm 7.6.2. 
fork 2 j4 liq 
Tik = Tik — ZT jx 
end 
for k = 1:q 
Qx; = Quiz + Qxj 
end 
end 
end 


The number of flops required by this algorithm is à complicated function 
of the block sizes in (7.6.3). 

The choice of the real Schur form T' and its partitioning in (7.6.3) de- 
termines the sensitivity of the Sylvester equations that must be solved in 
Algorithm 7.6.3. This in turn affects the condition of the matrix Y and 
the overall usefulness of the block diagonalization. The reason for these 
dependencies is that the relative error of the computed solution Z to 


TaZ —ZT;; = -Tij (7.6.6) 
satisfies . 
12-Zlg | ITI 
Il Z lle sep(Tii, Tjj) 
For details, see Golub, Nash, and Van Loan (1979). Since 


T; X — » . 

sep(Ti;, T5) — min ll TaX ~ XTjj le < min A — pl 
wo MR i) 
33 
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there can be a substantial loss of accuracy whenever the subsets \(Tj;) are 
insufficiently separated. Moreover, if Z satisfies (7.6.6) then 


T; 
Zip < nile. 
sep(T;i, Tjj) 
Thus, large-norm solutions can be expected if sep(Ty:,7;;) is small. This 


tends to make the matrix Y in Algorithm 7.6.3 ill-conditioned since it is 
the product of the matrices 


IZ 
"s = [o AE 


Note: xp (Yy) = 2n + || Z IŽ. 

Confronted with these difficulties, Bavely and Stewart (1979) develop 
an algorithm for block diagonalizing that dynamically determines the eigen- 
value ordering and partitioning in (7.6.3) so that all the Z matrices in Al- 
gorithm 7.6.3 are bounded in norm by some user-supplied tolerance. They 
find that the condition of Y can be controlled by controlling the condition 
of the Y;;. 


7.6.4 Eigenvector Bases 


If the blocks in the partitioning (7.6.3) are all 1-by-1, then Algorithm 7.6.3 
produces a basis of eigenvectors. As with the method of inverse iteration, 
the computed eigenvalue-eigenvector pairs are exact for some “nearby” ma- 
trix. A widely followed rule of thumb for deciding upon a suitable eigen- 
vector method is to use inverse iteration whenever fewer than 25% of the 
eigenvectors are desired. 

We point out, however, that the real Schur form can be used to deter- 
mine selected eigenvectors. Suppose 


Tu u Ti k-1 
QTAQ = 0 à oF 1 
0 0 T33 n—k 
k-1 1 n-k 
is upper quasi-triangular and that A ¢ A(T11) UA(T33). It follows that if we 
solve the linear systems (T1; — AI)w = —u and (T33 — AI )Tz = —v then 
w 0 
r-Qgl|1 and y= Qj 1 
0 z 


are the associated right and left eigenvectors, respectively. Note that the 
condition of A is prescribed by 1/s(A) = y (1 + wT w)(1 + zT). 
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7.6.5 — Ascertaining Jordan Block Structures 


Suppose that we have computed the real Schur decomposition A = QTQT, 
identified clusters of “equal” eigenvalues, and calculated the corresponding 
block diagonalization T = Y diag(Tii,...,T44)Y T1. As we have seen, this 
can be a formidable task. However, even greater numerical problems con- 
front us if we attempt to ascertain the Jordan block structure of each Tii. A 
brief examination of these difficulties will serve to highlight the limitations 
of the Jordan decomposition. 


Assume for clarity that A(T;;) is real. The reduction of T, to Jordan 
form begins by replacing it with a matrix of the form C = AI + N, where 
N is the strictly upper triangular portion of T;; and where A, say, is the 
mean of its eigenvalues. 


Recall that the dimension of a Jordan block J(A) is the smallest non- 
negative integer k for which [J(A) — AI]* = 0. Thus, if p; = dim[null(N*)], 
for i = O:n, then p; — p;-1 equals the number of blocks in C's Jordan 
form that have dimension i or greater. A concrete example helps to make 
this assertion clear and to illustrate the role of the SVD in Jordan form 
computations. 


Assume that C is 7-by-7. Suppose we compute the SVD UT NV, = Xi 
and "discover" that N has rank 3. If we order the singular values from 
small to large then it follows that the matrix N; = VĒ NV, has the form 


0K]4 
m= [57] 3 
4 3 


At this point, we know that the geometric multiplicity of À is 4—i.e, C's 
Jordan form has 4 blocks (py — po = 4— 0 = 4). 


Now suppose Ud LV, = Y is the SVD of L and that we find that L has 
unit rank. If we again order the singular values from small to large, then 
L3 = VŽ LV» clearly has the following structure: 


However A(L2) = A(L) = {0,0,0} and so c = 0. Thus, if 


V3 = diag(14, V2) 
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then N2 = V7 N, V2 has the following form: 


No = 


oooooco 
cocoocoo 
oooococoo 
coococoo 
oo OX X XK XxX 
ooo xX XK XK xX 
omen K Xxx 


Besides allowing us to introduce more zeroes into the upper triangle, the 
SVD of L also enables us to deduce the dimension of the null space of N?. 


Since 
N2 = 0 KL| _ JOK 0 K 
! [|o P| |0 L 0 L 
K 
and | L | has full column rank, 


p2 = dim(null(N?)) = dim(null(N?)) = 4+ dim(null(D)) = pi +2. 


Hence, we can conclude at this stage that the Jordan form of C has at least 
two blocks of dimension 2 or greater. 

Finally, it is easy to see that N? = 0, from which we conclude that there 
is pa — p2 = 7—6 = 1 block of dimension 3 or larger. If we define V = Vi V2 
then it follows that the decomposition 


à 0 00x x x 

Q A ` o . . . 4 blocks of order 1 or larger 
V™CV = 00 0 AÀ x x x 

oo 00 A x a ) 2 blocks of order 2 or larger 

00000 A 0 

000000 A } 1 block of order 3 or larger 


"displays" C's Jordan block structure: 2 blocks of order 1, 1 block of order 
2, and 1 block of order 3. 

To compute the Jordan decomposition it is necessary to resort to non- 
orthogonal transformations. We refer the reader to either Golub and Wilkin- 
son (1976) or Kágstróm and Ruhe (1980a, 1980b) for how to proceed with 
this phase of the reduction. 

The above calculations with the SVD amply illustrate that difficult 
rank decisions must be made at each stage and that the final computed 
block structure depends critically on those decisions. Fortunately, the sta- 
ble Schur decomposition can almost always be used in lieu of the Jordan 
decomposition in practical applications. 
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Problems 


P7.6.1 Give a complete algorithm for solving a real, n-by-n, upper quasi-triangular 
system Tx = b. 

P7.6.2 Suppose U-! AU = diag(a1,...,am) and V-1BV = diag(fi,..., Bn). Show 
that if $(X) = AX + XB, then (9) = { aj + Bj: i = Lim, j = Ln }. What 
are the corresponding eigenvectors? How can these decompositions be used to solve 
AX -XB-C? 


P7.6.3 Show that if Y = | H z | then «2(Y) = [2 + o? + V40% +07 ]/2 where 


o = || Z lla. 

P7.6.4 Derive the system (7.6.5). 

P7.6.5 Assume that T € R?*" is block upper triangular and partitioned as follows: 
Tir Ti? Ths 


T= 0 T2 Tz TeR™" 
0 0 T33 


Suppose that the diagonal block T22 is 2-by-2 with complex eigenvalues that are disjoint 
from A(T11) and A(T33). Give an algorithm for computing the 2-dimensional real invari- 
ant subspace associated with T22’s eigenvalues. 

P7.6.6 Suppose H € R”*” is upper Hessenberg with a complex eigenvalue A+i-~. How 
could inverse iteration be used to compute r,y € R” so that H(z+iy) = A-ip)(z - iy)? 
Hint: compare real and imaginary parts in this equation and obtain a 2n-by-2n real sys- 
tem. 


P7.6.6 (a) Prove that if po € C has nonzero real part, then the iteration 


bei = > | ek + — 
"72 Hk 


converges to 1 if Re(uo) > 0 and to -1 if Re(uo) < 0. (b) Suppose A € (^ is 
diagonalizable and that 


_ Dy 0 -1 
A-x| 0 p.]* 


where Dy € ©?*? and D_ € (^7? *("7P are diagonal with eigenvalues in the open 
right half plane and open left half plane, respectively. Show that the iteration 


Aky = ; (Ak + Ax!) Ao =A 


converges to 


, I, 0 - 
sign(A) X | 6 sop | x7, 
(c) Suppose 
Mi Mia 
M= | 0 Mz | n - p 
p n-p 

with the property that A( M11) is in the open right half plane and A(M22) is in the open 
left half plane. Show that 


sign(M) = | 7» 4 | 


and that —Z/2 solves Mi1 X — X M22 = —M,2. Thus, 


[| Ip -zn -1 —.| Mn 0 
u=| 0 i. | => U7 MU = 0 "ME 
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C. Bavely and G.W. Stewart (1979). “An Algorithm for Computing Reducing Subspaces 
by Block Diagonalization," STAM J. Num. Anal. 16, 359-67. 

B. Kágstróm and A. Ruhe (19808). “An Algorithm for Numerical Computation of the 
Jordan Normal Form of a Complex Matrix," ACM Trans. Math. Soft. 6, 398-419. 
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eigenvector include 


S.P. Chan and B.N. Parlett (1977). “Algorithm 517: A Program for Computing the 
Condition Numbers of Matrix Eigenvalues Without Computing Eigenvectors,” ACM 
Trans. Math. Soft. 3, 186-203. 
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Lin. Alg. and Its Applic. 88/89, 715-732. 

Z. Bai, J. Demmel, and A. McKenney (1993). *On Computing Condition Numbers for 
the Nonsymmetric Eigenproblem," ACM Trans. Math. Soft. 19, 202-223. 


As we have seen, the sep(.,.) function is of great importance in the assessment of a com- 
puted invariant subspace. Aspects of this quantity and the associated Sylvester equation 
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J. Varah (1979). "On the Separation of Two Matrices," SIAM J. Num. Anal 16, 
212-22. 

R. Byers (1984). “A Linpack-Style Condition Estimator for the Equation AX — X BT = 
C," IEEE Trans. Auto. Cont. AC-29, 926-928. 
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and Its Appl. 109, 91-105. 

N.J. Higham (1993). “Perturbation Theory and Backward Error for AX - XB = C," 
BIT 33, 124-136. 

J. Gardiner, M.R. Wette, A.J. Laub, J.J. Amato, and C.B. Moler (1992). “Algorithm 
705: A FORTRAN-77 Software Package for Solving the Sylvester Matrix Equation 
AX BT +CXDT = E, ACM Trans. Math. Soft. 18, 232-238. 


Numerous algorithms have been proposed for the Sylvester equation, but those described 
in 


R.H. Bartels and G.W. Stewart (1972). “Solution of the Equation AX + XB = C,” 
Comm. ACM 15, 820-26. 

G.H. Golub, S. Nash, and C. Van Loan (1979). “A Hessenberg-Schur Method for the 
Matrix Problem AX + XB = C," IEEE Trans. Auto. Cont. AC-24, 909-13. 


are among the more reliable in that they rely on orthogonal transformations. A con- 
strained Sylvester equation problem is considerd in 


J.B. Barlow, M.M. Monahemi, and D.P. O'Leary (1992). “Constrained Matrix Sylvester 
Equations, SIAM J. Matriz Anal. Appl. 13, 1-9. 
The Lyapunov problem FX + XFT = —C where C is non-negative definite has a 
very important role to play in control theory. See 


S. Barnett and C. Storey (1968). “Some Applications of the Lyapunov Matrix Equation,” 
J. Inst. Math. Applic. 4, 33-42. 

G. Hewer and C. Kenney (1988). “The Sensitivity of the Stable Lyapunov Equation,” 
SIAM J. Control Optim 26, 321-344, 

A.R. Ghavimi and A.J. Laub (1995). “Residual Bounds for Discrete-Time Lyapunov 
Equations,” IEEE Trans. Auto. Cont. 40, 1244-1249. 


Several authors have considered generalizations of the Sylvester equation, i.e., ZF;XG; = 
C. These include 


P. Lancaster (1970). “Explicit Solution of Linear Matrix Equations,” SIAM Review 12, 
544-66. 

H. Wimmer and A.D. Ziebur (1972). “Solving the Matrix Equations Xf5(A)gp(A) = C,” 
SIAM Review 14, 318-23. 

W.J. Vetter (1975). "Vector Structures and Solutions of Linear Matrix Equations," Lin. 
Alg. and Its Applic. 10, 181-88. 


Some ideas about improving computed eigenvalues, eigenvectors, and invariant sub- 
spaces may be found in 
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J.J. Dongarra, C.B. Moler, and J.H. Wilkinson (1983). “Improving the Accuracy of 
Computed Eigenvalues and Eigenvectors,” SIAM J. Numer. Anal. 20, 23-46. 


J.W. Demmel (1987). “Three Methods for Refining Estimates of Invariant Subspaces,” 
Computing 38, 43-57. 


Hessenberg/QR iteration techniques are fast, but not very amenable to parallel computa- 
tion. Because of this there is a hunger for radically new approaches to the eigenproblem. 
Here are some papers that focus on the matrix sign function and related ideas that have 
high performance potential: 


C.S. Kenney and A.J. Laub (1991). “Rational Iterative Methods for the Matrix Sign 
Function,” SIAM J. Matriz Anal. Appl. 12, 273-291. 


C.S. Kenney, A.J. Laub, and P.M. Papadopouos (1992). “Matrix Sign Algorithms for 
Riccati Equations,” IMA J. of Math. Control Inform. 9, 331-344. 


C.S. Kenney and A.J. Laub (1992). “On Scaling Newton’s Method for Polar Decompo- 
sition and the Matrix Sign Function,” SIAM J. Matriz Anal. Appl. 13, 688-706. 


N.J. Higham (1994). “The Matrix Sign Decomposition and Its Relation to the Polar 
Decomposition," Lin. Alg. and Its Applic 212/213, 3-20. 


L. Adams and P. Arbenz (1994). "Towards a Divide and Conquer Algorithm for the Real 
Nonsymmetric Eigenvalue Problem," SIAM J. Matriz Anal. Appl. 15, 1333-1353. 


7.7 The QZ Method for Ax = ABx 


Let A and B be two n-by-n matrices. The set of all matrices of the form 
A — AB with A € € is said to be a pencil. The eigenvalues of the pencil 
are elements of the set \(A, B) defined by 


A(A,B) = {z € C:det(A— zB) 20). 


If à € A(A, B) and 
Az = ABr rZz0 (7.7.1) 


then z is referred to as an eigenvector of A — AB. 


In this section we briefly survey some of the mathematical properties 
of the generalized eigenproblem (7.7.1) and present a stable method for its 
solution. The important case when A and B are symmetric with the latter 
positive definite is discussed in 88.7.2. 


7.7.1 Background 


The first thing to observe about the generalized eigenvalue problem is that 
there are n eigenvalues if and only if rank(B) — n. If B is rank deficient 
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then A(A, B) may be finite, empty, or infinite: 


A 


II 
| 
or 
wn 
La 
by 
i 


E o | = XA,B)- {1} 


A=| 4 a B= |) a | > A(4,B)=0 


0 0 
1 2 
foo] ® 
Note that if 0 # A € A(A, B) then (1/A) € A(B, A). Moreover, if B is 
nonsingular then A(A, B) = A(B^!A,I) = A(B^!A). 


This last observation suggests one method for solving the A — AB prob- 
lem when B is nonsingular: 


A 


E o| 2 \A,B)=C 


e Solve BC = A for C using (say) Gaussian elimination with pivoting. 
e Use the QR algorithm to compute the eigenvalues of C. 


Note that C will be affected by roundoff errors of order ull A |l2|| B^ |l2- 
If B is ill-conditioned, then this can rule out the possibility of computing 
any generalized eigenvalue accurately—even those eigenvalues that may be 
regarded as well-conditioned. 


Example 7.7.1 If 


1.246 1.898 


Aa | 1746 .940 
~ 913 .659 


and B= | sis 5 | 


then A(A, B) = (2,1.07x109). With 7-digit floating point arithmetic, we find A( fl(AB~1) 
= (1.562539, 1.01 x 108). The poor quality of the small eigenvalue is because K2(B) = 
2 x 106. On the other hand, we find that 

A, fUA7! B)) = (2.000001, 1.06 x 10°}. 
The accuracy of the small eigenvalue is improved because k2(A) ~ 4. 
Example 7.7.1 suggests that we seek an alternative approach to the A— AB 


problem. One idea is to compute well-conditioned Q and Z such that the 
matrices 


A = QAZ  Bi-Q^!BZ (7.7.2) 
are each in canonical form. Note that A(A, B)— (Ai, Bı) since 
Az = ABzy & Ay = AB y t = Zy 


We say that the pencils A — AB and Ai — àB; are equivalent if (7.7.2) 
holds with nonsingular Q and Z. 
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7.7.2 The Generalized Schur Decomposition 


As in the standard eigenproblem A — AJ there is a choice between canonical 
forms. Analogous to the Jordan form is a decomposition of Kronecker in 
which both A; and B, are block diagonal. The blocks are similar to Jordan 
blocks. The Kronecker canonical form poses the same numerical difficulties 
as the Jordan form. However, this decomposition does provide insight into 
the mathematical properties of the pencil A — AB. See Wilkinson (1978) 
and Demmel and Kagstrém (1987) for details. 

More attractive from the numerical point of view is the following de- 
composition described in Moler and Stewart (1973). 


Theorem 7.7.1 (Generalized Schur Decomposition) If A and B are 
in C"^*^, then there exist unitary Q and Z such that QV AZ = T and 
Q BZ — S are upper triangular. If for some k, tkk and Skk are both zero, 
then A(A, B) = €. Otherwise 


A(A,B) = (tu/sa:su Z0). 

Proof. Let {B} be a sequence of nonsingular matrices that converge to B. 
For each k, let Q (AB, !)Q, = Ry be a Schur decomposition of AB, '. Let 
Zk be unitary such that ZF (By 'Qx) = SL! is upper triangular. It follows 
that both QE AZ, = R&,S, and QE BZ. = S, are also upper triangular. 

Using the Bolzano- Weierstrass theorem, we know that the bounded se- 
quence ((Qx, Z,)} has a converging subsequence, lim(Qi,, Z%;) = (Q, Z). 
It is easy to show that Q and Z are unitary and that Q¥ AZ and Q# BZ 
are upper triangular. The assertions about A(A, B) follow from the identity 


det(A — AB) = det(QZP) Tes — Asa). B 


t=1 
If A and B are real then the following decomposition, which corresponds 


to the real schur decomposition (Theorem 7.4.1), is of interest. 


Theorem 7.7.2 (Generalized Real Schur Decomposition) If A and 
B are in R°™” then there exist orthogonal matrices Q and Z such that 
QT AZ is upper quasi-triangular and QT BZ is upper triangular. 


Proof. See Stewart (1972). O 


In the remainder of this section we are concerned with the computation of 
this decomposition and the mathematical insight that it provides. 
7.7.3 | Sensitivity Issues 


The generalized Schur decomposition sheds light on the issue of eigenvalue 
sensitivity for the A — AB problem. Clearly, small changes in A and B can 
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induce large changes in the eigenvalue A; = t;;/s; if sj; is small. However, 
as Stewart (1978) argues, it may not be appropriate to regard such an 
eigenvalue as “ill-conditioned.”. The reason is that the reciprocal p; = 
sá/ta might be a very well behaved eigenvalue for the pencil pA — B. In 
the Stewart analysis, A and B are treated symmetrically and the eigenvalues 
are regarded more as ordered pairs (t,;, s;;) than as quotients. With this 
point of view it becomes appropriate to measure eigenvalue perturbations 
in the chordal metric chord (a,b) defined by 


la — b| 
chord(a,b) = ——————á. 
(ab) V1 +a? vl 4- b? 

Stewart shows that if À is a distinct eigenvalue of A — AB and Ae is the 


corresponding eigenvalue of the perturbed pencil À— AB with || A — Alo = 
|| B — B ||2 = e, then 


€ 
<_< E č _ 
chord(A, àe) < (yF Az)? + (yF Bz)? 


+ O(e?) 
where z and y have unit 2-norm and satisfy Ar = ABz and yF = AyF B. 
Note that the denominator in the upper bound is symmetric in A and B. 


The “truly” ill-conditioned eigenvalues are those for which this denominator 
is small. 


The extreme case when tkk = Skk = 0 for some k has been studied 
by Wilkinson (1979). He makes the interesting observation that when this 
occurs, the remaining quotients £;/s;; can assume arbitrary values. 


7.7.4  Hessenberg- Triangular Form 


The first step in computing the generalized Schur decomposition of the pair 
(A, B) is to reduce A to upper Hessenberg form and B to upper triangular 
form via orthogonal transformations. We first determine an orthogonal U 
such that UT B is upper triangular. Of course, to preserve eigenvalues, we 
must also update A in exactly the same way. Let's trace what happens in 
the n — 5 case. 


* 


A=UTA= ,B-UTB- 


X X X XX 
X X X X X 
X X X X X 
X X X XX 
X X X X X 
oooo xXx 
ooo xX xX 
oO XxX XK X 
OX X XK Xx 
X X X X X 
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Next, we reduce A to upper Hessenberg form while preserving B’s upper 


triangular form. First, a Givens rotation Q45 is determined to zero aşı: 


A=Qj54 = ,B = QIB = 


OX X X X 
X X X X X 
X X X XX 
X X X X X 
X X X X X 
cooox 
ooO xX XxX 
OO xX XX 


The nonzero entry arising in the (5,4) position in B can be zeroed by 
postmultiplying with an appropriate Givens rotation Zas : 


OX X XK X 
X X X X X 
X X X X X 
X X X X X 
X X X X X 
cooox 
oO O° X xX 
OO X Xx 


Zeros are similarly introduced into the (4, 1) and (3, 1) positions in A: 


X X X X X X X X Xx Xx 
X X X X X 0 x x x x 
A-QiA-|x x x x x ,BzZQLB-|0 0 x x x 
0 x x x x 0 0 x x x 
0 x x x x 0 0 0 0 x 
X X X X X X X X X X 
X X X X X 0 x xX x x 
A—AZa-—|x x x x x ,B-ZBZ42|0 0 x x x 
Q x x x x 0 0 0 x x 
0 x x x x 0 0 0 0 x 
X X X Xx X X X X Xx X 
X X X X X 0 x x x x 
A-QLA-|0 x x x x ,B-QLB-|0 x x x x 
0 x x x x 0 0 0 x x 
0 x x x x 0 0 0 0 x 
X X X X Xx X X X X X 
X X X X X 0 x x x x 
A= AZz=| 0 x x x x |,B=BZ.3=|0 0 x x x 
0 x x x x 0 0 0 x x 
0 x x x x 0 0 0 0 x 


A is now upper Hessenberg through its first column. The reduction is 
completed by zeroing a52, a42, and asz. As is evident above, two orthogonal 
transformations are required for each aj; that is zeroed—one to do the 
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zeroing and the other to restore B’s triangularity. Either Givens rotations 
or 2-by-2 modified Householder transformations can be used. Overall we 


have: 


Algorithm 7.7.1 (Hessenberg-Triangular Reduction) Given A and 
B in IR^*^, the following algorithm overwrites A with an upper Hessenberg 
matrix QT AZ and B with an upper triangular matrix QT BZ where both 
Q and Z are orthogonal. 


Using Algorithm 5.2.1, overwrite B with QT B — R where 


Q is orthogonal and R is upper triangular. 


A=QTA 
for j=1:in-2 


e 


for i= n:—-1:j +2 
(e; s] = givens(A(i — 1,3), A(i, J) 


AG-rRim- | e :] A(i -1 
c T 
B(-lii-lm)-| . ; Bi 


[c, s] = givens(—B(i,i), B(i,i — 1)) 


Blii- i) = Bii- | 
Almi- 18) = AQmi-l)| | 
end 
nd 


4, jin) 


— 14,1 — 1:n) 


S 
c 


| 


This algorithm requires about 8n? flops. The accumulation of Q and Z 
requires about 4n? and 3n? flops, respectively. 

The reduction of A — AB to Hessenberg-triangular form serves as a 
“front end” decomposition for a generalized QR iteration known as the QZ 
iteration which we describe next. 


Example 7.7.3 If 


10 1 2 
A= 1 2 -1 and B= 
1 1 2 


and orthogonal matrices Q and Z are defined by 


5 


then A1 


Ay = 


—.4924 .0279  —.8699 
—.8616 .1257 .4917 


= QT AZ and Bı = QT BZ are given by 
| —2.5849 1.5413 2.4221 


—9.7631 .0874 1.9239 and Bı = 
0.0000 2.7233  —.7612 


—.1231  —.9917 .0378 
and Z 


—8.1240 3.6332 
0.0000 0.0000 
0.0000 0.0000 


0.0000 —.8944 —.4472 


1.0000 0.0000 0.0000 
0.0000 .4472 —.8944 


14.202 
1.873 
.761 
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7.7.5 Deflation 


In describing the QZ iteration we may assume without loss of generality that 
A is an unreduced upper Hessenberg matrix and that B is a nonsingular 
upper triangular matrix. The first of these assertions is obvious, for if 
ükj41k = 0 then 


Ai -ABy Aig - ABi2 k 
0 A22 -— AÀB22 n-k 
k n—k 


A-AB = 


and we may proceed to solve the two smaller problems 4j; — ABj; and 
A22 — ABg2. On the other hand, if bkk = 0 for some k, then it is possible to 
introduce a zero in A's (n,n — 1) position and thereby deflate. Illustrating 
by example, suppose n = 5 and k = 3: 


X X X Xx X X X X X X 
X X X X x 0 x x x x 
A=,0 x x x x|, B-—-|000 x x 
0 0 x x x 0 0 0 x x 
0 0 0 x x 0 0 0 0 x 


The zero on B's diagonal can be “pushed down" to the (5,5) position as 
follows using Givens rotations: 


X X X X X X X X X X 

X X X X X 0 x x x x 
A-QiA-|0 x x x x ,B2QLB-|0 00 x x 
0 x x x x 0 0 0 0 x 

0 0 0 x x 0 0 0 0 x 

X X X X X X X X X X 

X X X X X 0 x x x x 
A=AZo3 =| 0 x x x x |,B=BZ3=|0 0 0 x x 
0 0 x x x 0 0 0 0 x 

0 0 0 x x 0 0 0 0 x 

X X X X X X X X X X 

X X X X X 0 x x x x 
A-QLA-|0 x x x x ,B-Q.B-10 0 0 x x 
0 0 x x x 0 0 0 0 x 

0 0 x x x 0 0 0 0 0 

x x x x x x x x x x 

X X X X Xx 0 x x x x 
A=AZy=|0 x x x x ,B-BZl1-|0 0 x x x 
0 0 x x x 0 0 0 0 x 

E 0 0 x ] E 0 0 0 l 
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A= AZ45 = »B=B2Z45 = 


ooo XxX xX 
OO xX XK XK 
OX X XX 
OX X X X 
X X X XX 
oooo x 
oooxx 
OO xX Xx 
OX X Xx 
OX X Xx 


This zero-chasing technique is perfectly general and can be used to zero 
Q4,n—1 regardless of where the zero appears along B's diagonal. 


7.7.6 The QZ Step 


We are now in a position to describe a QZ step. The basic idea is to update 
A and B as follows 


(A—AB) = QT(A— AB)Z, 


where A is upper Hessenberg, B is upper triangular, Q and Z are each 
orthogonal, and AB-! is essentially the same matrix that would result if a 
Francis QR step (Algorithm 7.5.2) were explicitly applied to AB~!. This 
can be done with some clever zero-chasing and an appeal to the implicit Q 
theorem. 

Let M = AB™! (upper Hessenberg) and let v be the first column of the 
matrix (M — aI)(M — bI), where a and b are the eigenvalues of M's lower 
2-by-2 submatrix. Note that v can be calculated in O(1) flops. If Po is a 
Householder matrix such that Pov is a multiple of e1, then 


A= PRA = 


oooc XK K XK 
ooo xX XK X 
OO X X XK X 
OX X X X X 
X X X X X X 
X X X X X x 


B = PB = 


cOOOX Xx 
ooo X XK X 
ooo X XK xX 
OO XK X Xx 
OX X X XK X 
X X X X X X 


The idea now is to restore these matrices to Hessenberg-triangular form by 
chasing the unwanted nonzero elements down the diagonal. 
To this end, we first determine a pair of Householder matrices Z, and 
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22 to zero ba, b32, and bai: 


X X X X X X 
X X X X X X 
A = AZZ = X X X X X X 
X X X X X X 
0 0 0 x x x 
0 0 0 0 x x 
X X X X X X 
0 x x x X X 
0 0 x x x x 
B = BZZ = 0 0 0 x x x 
0 0 0 0 x x 
0 0 0 0 0 x 


Then a Householder matrix P, is used to zero a3; and a4): 


X X X X X x 
X X X X X xX 
A=PA= 0 x xX X X X 
0 x x x x x 
0 0 0 x x x 
0 0 0.0 x x 
X X X X X x 
0 x X x X x 
B = PB = 0 x xX X X X 
0 x X X X x 
0 0 0.0 x x 
0 0 0 0 0 x 


Notice that with this step the unwanted nonzero elements have been shifted 
down and to the right from their original position. This illustrates a typical 
step in the QZ iteration. Notice that Q = QoQ --- Qn—2 has the same first 
column as Qo. By the way the initial Householder matrix was determined, 
we can apply the implicit Q theorem and assert that AB^! = QT(AB-!)Q 
is indeed essentially the same matrix that we would obtain by applying the 
Francis iteration to M = AB! directly. Overall we have: 


Algorithm 7.7.2 (The QZ Step) Given an unreduced upper Hessenberg 
matrix A € IR^*" and a nonsingular upper triangular matrix B € IR?*", 
the following algorithm overwrites A with the upper Hessenberg matrix 
QT AZ and B with the upper triangular matrix QT BZ where Q and Z are 
orthogonal and Q has the same first column as the orthogonal similarity 
transformation in Algorithm 7.5.1 when it is applied to AB-!. 
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Let M = AB^! and compute (M — aI)(M — bl)ei = (z,y,2,0,...,0) 
where a and b are the eigenvalues of M's lower 2-by-2. 
for k = 1:n — 2 
Find Householder Qx so Q&[zyz]T = [*00]". 
A = diag(Ik-1, Qk, In-k-2)A 
B= diag(Ig 1, Qk, In-k-2)B 
Find Householder Z;1 so 
[ desk br+2,k+1 Pka2ka2 | Ze =[0 0 * J. 
A= Adiag(Ip_1, Zi, In—k—2) 
B= Bdiag(Ip_1, Zki, In—k—2) 
Find Householder Zkz so 
[5x Paesi ] Ze2=[0 * |. 
A= Adiag(Ip—1, Zk2, In—k—1) 
B= Bdiag(Iy 1, Zk2, In-k-1) 
T = Ok41,k3 Y = @k4+1,k 
ifk «n—2 
Z = Ok+3,k 
end 
end 


Find Householder Qn—1 80 Qn-1 | T | = | * | 


y 
A = diag(I5 2, Qn-1)A 
B= diag(In—2, Qn-1)B 
Find Householder Z,_; so 
[ bn,n-1 bnn ] 2s = [ 0 * ] 
A = Adiag(In—2; Zn. 1) 
B = Bdiag(In—2, Zn-1) 


This algorithm requires 22n? flops. Q and Z can be accumulated for an 
additional 8n? flops and 13n? flops, respectively. 


7.7.7 The Overall QZ Process 


By applying a sequence of QZ steps to the Hessenberg-triangular pencil 
A — AB, it is possible to reduce A to quasi-triangular form. In doing this it 
is necessary to monitor A's subdiagonal and B's diagonal in order to bring 
about decoupling whenever possible. The complete process, due to Moler 
and Stewart (1973), is as follows: 


Algorithm 7.7.3 Given A € R?** and B € IR"*", the following algo- 
rithm computes orthogonal Q and Z such that Q7 AZ =T is upper quasi- 


triangular and Q7 BZ = S is upper triangular. A is overwritten by T and 
B by S. 
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Using Algorithm 7.7.1, overwrite A with QT AZ (upper Hessenberg) 
and B with QT BZ (upper triangular). 
until g=n 
Set all subdiagonal elements in A to zero that satisfy 
laii-i| € €({@s—1,1-1| + Jal) 
Find the largest nonnegative q and the smallest nonnegative p 
such that if 


An Ai2 Aia p 
A= 0 A22 A23 n—p-q 
0 0 A33 q 
p n—p-d4q q 


then A33 is upper quasi-triangular and A25 is unreduced 
upper Hessenberg. 
Partition B conformably: 


ifg<n 
if B5, is singular 
Zero Gn—gyn—q—1 
else 
Apply Algorithm 7.7.2 to Ago and Boo 
A = diag(Ip,Q, I)” Adiag(Ip, Z, Iq) 
B = diag(15, Q, I4)" Bdiag(Ip, Z, Iq) 
end 
end 
end 


This algorithm requires 30n? flops. If Q is desired, an additional 16n? are 
necessary. If Z is required, an additional 20n? are needed. These estimates 
of work are based on the experience that about two QZ iterations per 
eigenvalue are necessary. Thus, the convergence properties of QZ are the 
same as for QR. The speed of the QZ algorithm is not affected by rank 
deficiency in B. 


The computed S and T can be shown to satisfy 


QT(A+ E)Z =T QE(B+F)Zo = 5 
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where Qo and Zo are exactly orthogonal and || E ||2 = ull A ||2 and || F |l2 = 
ull B |lz. 


Example 7.7.5 Ifthe QZ algorithm is applied to 


2 345 6 1 -1 -1 -1 -1 
4 4 5 6 7 0 1 -1 -1 -1 
A= 03 6 7 8 and B = 0 0 1 -1 -1 
002.8 9 0 0 0 1 -1 
0 0 0 1 10 0 0 0 0 1 


then the subdiagonal elements of A converge as follows 


Iteration — O(|hzi])  O(lha2]) O(lha3l) — O1hsal) 


1 100 10! 10° 107! 
2 10° 10° 10° 1071 
3 109 10! 1071 10-3 
4 10° 10° 10-7! 10-8 
5 10° 10! 1071 10-16 
6 10° 10° 10-2 converg. 
7 10° 10-1 10-4 

8 10! 107! 1078 

9 109 107! 10-19 

10 10° 10-2 converg. 

11 1071 1074 

12 107? 10-1 

13 10-3 10-27 

14 converg. converg. 


7.7.8 Generalized Invariant Subspace Computations 


Many of the invariant subspace computations discussed in §7.6 carry over to 
the generalized eigenvalue problem. For example, approximate eigenvectors 
can be found via inverse iteration: 


q® € C^** given. 

for k = 1,2,... 
Solve (A — pB)z‘*) = Bq(*-9 
Normalize: g*) = 209 /|| 209 |l; 
AE = [g()] Ag® / OJE Ag) 

end 


When B is nonsingular, this is equivalent to applying (7.6.1) with the 
matrix B-!A. Typically, only a single iteration is required if p is an ap- 
proximate eigenvalue computed by the QZ algorithm. By inverse iterat- 
ing with the Hessenberg-triangular pencil, costly accumulation of the Z- 
transformations during the QZ iteration can be avoided. 

Corresponding to the notion of an invariant subspace for a single ma- 
trix, we have the notion of a deflating subspace for the pencil A — AB. In 
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particular, we say that a k-dimensional subspace S C IR” is “deflating” for 
the pencil A — AB if the subspace ( Az + By: x,y € S} has dimension k or 
less. Note that the columns of the matrix Z in the generalized Schur decom- 
position define a family of deflating subspaces, for if Q = [q1,...,94 ] and 
Z = [zy,...,24] then we have span(Azi,..., Azk} C span{qi,...,q%} and 
span{Bz,,...,Bz,} C span{q,...,q,}. Properties of deflating subspaces 
and their behavior under perturbation are described in Stewart (1972). 


Problems 
P7.7.1 Suppose A and B are in R"*" and that 


D 0 r 
UTBV = [3 o | ner U-[U Us] V-2[M v] 
r n-—r r n—r 
rn-r 
is the SVD of B, where D is r-by-r and r = rank(B). Show that if A(A, B) = Č then 
UT AV, is singular. 


P7.7.2 Define F : R” — R by 


TBT 
Ar - ZB Br 


F(z) = > 
773 zT BT Bz 


where A and B are in R™*". Show that if VF(x) = 0, then Az is a multiple of Bz. 


P7.7.3 Suppose A and B are in R"*", Give an algorithm for computing orthogonal Q 
and Z such that QT AZ is upper Hessenberg and ZT BQ is upper triangular. 


P7.7.4 Suppose 


_ [ Ai A1] _[ Bu B 
A=| a As | and B=| 2 5: | 


with A11, B11 € R*** and A25, B22 € Ri*3. Under what circumstances do there exist 
_[ Ie Xo [Ik Yo 
X= | 0 hj and Y= 0 G 
so that Y -! AX and Y -1 BX are both block diagonal? This is the generalized Sylvester 
equation problem. Specify an algorithm for the case when A11, A22, B11, and B22 are 
upper triangular. See Kágstróm (1994). 
P7.7.5 Suppose u ¢ A(A, B). Relate the eigenvalues and eigenvectors of Ay = (A — 
pBB)-1A and Bı = (A — nB)-1B to the generalized eigenvalues and eigenvectors of 
A — AB. 
P7.7.6 Suppose A, B, C, D € R**^. Show how to compute orthogonal matrices Q, Z, U, 
and V such that QT AU is upper Hessenberg and VT CZ, QT BV, and VT DZ are all 
upper triangular. Note that this converts the pencil AC — ABD to Hessenberg-triangular 
form. Your algorithm should not form the products AC or BD explicitly and not should 
not compute any matrix inverse. See Van Loan (1975). 


Notes and References for Sec. 7.7 
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Chapter 8 


The Symmetric 
Eigenvalue Problem 


$8.1 Properties and Decompositions 

$8.2 Power Iterations 

$8.3 The Symmetric QR Algorithm 

§8.4 Jacobi Methods 

$8.5 Tridiagonal Methods 

§8.6 Computing the SVD 

$8.7 Some Generalized Eigenvalue Problems 


The symmetric eigenvalue problem with its rich mathematical struc- 
ture is one of the most aesthetically pleasing problems in numerical linear 
algebra. We begin our presentation with a brief discussion of the math- 
ematical properties that underlie this computation. In §8.2 and §8.3 we 
develop various power iterations eventually focusing on the symmetric QR 
algorithm. 

In §8.4 we discuss Jacobi’s method, one of the earliest matrix algorithms 
to appear in the literature. This technique is of current interest because it is 
amenable to parallel computation and because under certain circumstances 
it has superior accuracy. 

Various methods for the tridiagonal case are presented in §8.5. These 
include the method of bisection and a divide and conquer technique. 

The computation of the singular value decomposition is detailed in §8.6. 
The central algorithm is a variant of the symmetric QR iteration that works 
on bidiagonal matrices. 
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In the final section we discuss the generalized eigenvalue problem Az = 
ABz for the important case when A is symmetric and B is symmetric 
positive definite. No suitable analog of the orthogonally-based QZ algo- 
rithm (see §7.7) exists for this specially structured, generalized eigenprob- 
lem. However, there are several successful methods that can be applied 
and these are presented along with a discussion of the generalized singular 
value decomposition. 


Before You Begin 


Chapter 1, §§2.1-2.5, and §2.7, Chapter 3, §§4.1-4.3, §§5.1-5.5 and §7.1.1 
are assumed. Within this chapter there are the following dependencies: 


§8.4 
T 
$81 - $82 — 883 — 6886 — 887 


l 
88.5 


Many of the algorithms and theorems in this chapter have unsymmetric 
counterparts in Chapter 7. However, except for a few concepts and defini- 
tions, our treatment of the symmetric eigenproblem can be studied before 
reading Chapter 7. 

Complementary references include Wilkinson (1965), Stewart (1973), 
Gourlay and Watson (1973), Hager (1988), Chatelin (1993), Parlett (1980), 
Stewart and Sun (1990), Watkins (1991), Jennings and McKeowen (1992), 
and Datta (1995). Some Matlab functions important to this chapter are 
schur and svd. LAPACK connections include 


LAPACK: Symmetric Eigenproblem 
-SYEV All eigenvalues and vectors 
-SYEVD | Same but uses divide and conquer for eigenvectors 
-SYEVX | Selected eigenvalues and vectors 
-SYTRD | Householder tridiagonalization 
-SBTRD | Householder tridiagonalization (A banded) 


-SPTRD | Householder tridiagonalization (À in packed storage) 
All eigenvalues and vectors of tridiagonal by implicit QR. 
All eigenvalues and vectors of tridiagonal by divide and conquer 
All eigenvalues of tridiagonal by root-free QR 
All eigenvalues and eigenvectors of positive definite tridiagonal 
Selected eigenvalues of tridiagonal by bisection 
Selected eigenvectors of tridiagonal by inverse iteration 


LAPACK: Symmetric-Definite Eigenproblems 


-SYGST | Converts A — AB to C — AI form 
Split Cholesky factorization 
Converts banded A — AB to C — AI form via split Cholesky 
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LAPACK: SVD 
A-UXVT 
SVD of real bidiagonal matrix 
bidiagonalization of general matrix 
generates the orthogonal transformations 
bidiagonalization of band matrix 


LAPACK: The Generalized Singular Value Problem 
-GGSVP | Converts AT A — u? BT B to triangular AT Ai — u? BT By 
-TGSJA | Computes GSVD of a pair of triangular matrices. 


8.1 Properties and Decompositions 


In this section we set down the mathematics that is required to develop 
and analyze algorithms for the symmetric eigenvalue problem. 


8.1.1 Eigenvalues and Eigenvectors 


Symmetry guarantees that all of A's eigenvalues are real and that there is 
an orthonormal basis of eigenvectors. 


Theorem 8.1.1 (Symmetric Schur Decomposition) If A € IR^*" is sy 
metric, then there exists a real orthogonal Q such that 


QT AQ = A = diag(i,..., An): 
Moreover, for k = 1:n, AQ(:,k) = Az Q(:, k). See Theorem 7.1.3. 


Proof. Suppose A, € A(A) and that z € C” is a unit 2-norm eigenvector 
with Ar = Az. Since Àj = zÉ Az = rH AH; = rH Az = A, it follows 
that À, € IR. Thus, we may assume that r € IR^. Let P, € IR^*^ be 
a Householder matrix such that PT z = e, = 1,(:,1). It follows from 
Arz = A,z that (PT AP, )e = Ae,. This says that the first column of 
PT AP, is a multiple of ej. But since PT AP, is symmetric it must have 
the form N 0 
T 1 
P, AP, = l 0 A | 

where A; € IR(^-U*(^7 is symmetric. By induction we may assume that 
there is an orthogonal Q; € IR(^- 0 *(^-9) such that QT AQ, = A, is diag- 
onal. The theorem follows by setting 


orlo à | and a=|4 M 


and comparing columns in the matrix equation AQ = QA. 0 


Example 8.1.1 If 
a=|34 82 | and e=|5 4] 
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then Q is orthogonal and QT AQ = diag(10,5). 


For a symmetric matrix A we shall use the notation \,(A) to designate the 
kth largest eigenvalue. Thus, 


An(A) € +++ € (4) < A (A). 


It follows from the orthogonal invariance of the 2-norm that A has singular 


values {|\1(A)|,---,|An(A)|} and so 
|| Ala = maxt u CA), |An(A)] }- 


The eigenvalues of a symmetric matrix have a “minimax” characteriza- 
tion based on the values that can be assumed by the quadratic form ratio 
zT Az/zTz. 

Theorem 8.1.2 (Courant-Fischer Minimax Theorem) Jf A c R^*^ 
is symmetric, then yT Ay 


Ak(A) = max min — 
dim(S)=k OfyeS V V 


fork — Yn. 


Proof. Let QT AQ = diag(A;) be the Schur decomposition with Ax = A&(A) 
and Q = T Q2,.. + Qn }. Define 


Sk = span(qi, t Qk}; 

the invariant subspace associated with \,,...,A,. It is easy to show that 
T A T A 

max min viv > min yt 
dim(S)=k Ofyes Y Y ofyeS, Y Y 


= gl Aqk = Ak(A). 


To establish the reverse inequality, let S be any k-dimensional subspace and 
note that it must intersect span(qgx, - - - , qn}, a subspace that has dimension 
n—k+1. If y, = ange +: + OnQ is in this intersection, then 


TA T Ay, 
min y A y < ni y 
Ofyes V Vy V. Yu 


€ Ak(A). 
Since this inequality holds for all k-dimensional subspaces, 


TA 
max min y T y 
dim(S)=k 0zycS V V 


< Ak(A) 


thereby completing the proof of the theorem. O 


If A € IR?*" is symmetric positive definite, then A, (A) > 0. 
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8.1.2 Eigenvalue Sensitivity 


An important solution framework for the symmetric eigenproblem involves 
the production of a sequence of orthogonal transformations {Qx} with the 
property that the matrices QT AQ, are progressively ^more diagonal" The 
question naturally arises, how well do the diagonal elements of a matrix 
approximate its eigenvalues? 


Theorem 8.1.3 (Gershgorin) Suppose A € IR?*" is symmetric and that 
Q € ^*^ is orthogonal. If QT AQ = D+F where D = diag(di,...,d4) 
and F has zero diagonal entries, then 


A(A) C (Jid: -ridi + ri] 


n 
where rj = > |fij| for à = 1:n. See Theorem 7.2.1. 
j=l 


Proof. Suppose A € A(A) and assume without loss of generality that À Z di 
for i = lin. Since (D — AI) + F is singular, it follows from Lemma 2.3.3 
that 


- fies r 
1 < | (D -ADF llo = us ~ de-N 
j=l 


for some k, 1 < k € n. But this implies that À € [dy — ry, dk + rg}. O 


Example 8.1.2 The matrix 


2.0000 0.1000 0.2000 
A= | 0.2000 5.0000 0.3000 
0.1000 0.3000  —1.0000 


has Gerschgorin intervals [1.7,2.3], [4.5,5.5], and [—1.4, —.6] and eigenvalues 1.9984, 
5.0224, and -1.0208. 


The next results show that if A is perturbed by a symmetric matrix E, 
then its eigenvalues do not move by more than || E ||. 


Theorem 8.1.4 (Wielandt-Hoffman) If A and A+ E are n-by-n sym- 
metric matrices, then 


n 


DOA +E) - AO < IEIR. 


i=1 
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Proof. A proof can be found in Wilkinson (1965, pp.104-8) or Stewart 
and Sun (1991, pp.189-191). See also P8.1.5. UJ 


Example 8.1.3 If 


_ [68 24 | f .002 .003 
a=] i] and — E = | ‘003 eu] 


then A(A) = {5,10} and A(A + E) = (4.9988, 10.004} confirming that 
1.95 x 1075 = |4.9988 — 5|? + |10.004 — 10? < || EI} = 2.3 x 1075. 


Theorem 8.1.5 If A and A+ E are n-by-n symmetric matrices, then 
Ak(A) + An(E) € Ak(A-- E) € AA) - AY(E) k= 1m. 


Proof. This follows from the minimax characterization. See Wilkinson 
(1965, pp.101-2) or Stewart and Sun (1990, p.203). o 


Example 8.1.4 If 


[68 24  [.002 .003 
A= | 24 22 | and B= | .003  .001 II 


then A(A) = (5, 10}, A(E) = (—.0015, .0045), and A(A + E) = (4.9988, 10.0042}. 
confirming that 


5—.0015 <4.9988< 54.0045 
10—.0015 <10.0042< 10+ .0045. 


Corollary 8.1.6 If A and A+ E are n-by-n symmetric matrices, then 
|An(A + E) - Ax (A)| < || E lle 
for k 5 1m. 
Proof. 
[A&(4 + E) — Ak(4)| < max(]A (E)] , Da C)I) = || lla. 9 


Several more useful perturbation results follow from the minimax property. 


Theorem 8.1.7 (Interlacing Property) If A € R"*” is symmetric and 
A, = A(1:r, Er), then 


Arg¢i(Ar41) < Ar(Ar) < Ar(Ar41) Eee < A2(Ar41) < Ai (Ar) < Ai(Ar41) 


for r =1:n—-1. 
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Proof. Wilkinson (1965, pp.103-4). o 


Example 8.1.5 If 


11 1 1 
_{12 3 4 
A-7158 6 10 
1 4 10 20 


then A(A1) = (1), A(42) = (.3820, 2.6180), A(A3) = {.1270, 1.0000, 7.873), and 
A(A4) — (.0380, .4538, 2.2034, 26.3047). 


Theorem 8.1.8 Suppose B = A+ rec? where A € R°™” is symmetric, 
c € R” has unit 2-norm andr € IR. If 7 > 0, then 
AB) € [u(A), Aic i(4)] 4=2:n 
while if T <0 then 
AB) € [Ai (A), Ai(A)], i-ln-1l. 
In either case, there exist nonnegative m,,...,m4 such that 
AB) = Ai(A) + miT, i=1n 
with m, +---+m,=1. 


Proof. Wilkinson (1965, pp.94-97). See also P8.1.8. O 


8.1.3 Invariant Subspaces 


Many eigenvalue computations proceed by breaking the original problem 
into a collection of smaller subproblems. The following result is the basis 
for this solution framework. 


Theorem 8.1.9 Suppose A € IR"*" is symmetric and that 


Q=[Q Q] 
r n-—r 
is orthogonal. If ran(Q,) is an invariant subspace, then 
D 0 r 
T _ _ 1 
Q AQ-D = | 0 5, | n—r (8.1.1) 
r n-r 


and A(A) = A(D1) U A(D2). See also Lemma 7.1.2. 
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Proof. If 
ga- | pt BY], 


then from AQ = QD we have AQ: — QıDı = QsEa. Since ran(Qi) is 
invariant, the columns of Q2£2) are also in ran(Qi) and therefore perpen- 
dicular to the columns of Q5. Thus, 


0 = Q2(AQi - QuD)) = Q2QsEa = En. 
and so (8.1.1) holds. It is easy to show 
det(A—AIn) = det(QT AQ — AL,) = det(D, — XI,,)det(D2 — AL...) 
confirming that A(A) = A(D,) U A( D2). 0 
The sensitivity to perturbation of an invariant subspace depends upon 
the separation of the associated eigenvalues from the rest of the spectrum. 


The appropriate measure of separation between the eigenvalues of two sym- 
metric matrices B and C is given by 


sep(B,C) = MU |^ — ul. (8.1.2) 
u€A(C) 


With this definition we have 
Theorem 8.1.10 Suppose A and A+ E are n-by-n symmetric matrices 
and that 
Q=[@ Q] 
T n—r 


is an orthogonal matriz such that ran(Q1) is an invariant subspace for A. 
Partition the matrices QT AQ and QT EQ as follows: 


D, 0 r Ej, El r 
T _ 1 T n £3) 
Q'AQ = E De | n-—r Q EQ = p z n-r 
r n-r r n—r 
If sep(Di, D2) > 0 and 
sep( Di, D2) 
< Vesa 
|l E lle — 5 , 
then there exists a matriz P € IR^-** with 
4 


P|o € —— IE 
I| P lle sep(Di, Da) || Ez ll 
such that the columns of Qi = (Q1 + Q2P)(I + PTP)-!/? define an or- 
thonormal basis for a subspace that is invariant for A+ E. See also Theorem 
7.2.4. 
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Proof. This result is a slight adaptation of of Theorem 4.11 in Stewart 
(1973). The matrix (I + PT P)-!/? is the inverse of the square root of 
I+ PT P. See $42.10. 0 


Corollary 8.1.11 If the conditions of the theorem hold, then 


4 


dist(ran(Q1), ran(Q1)) < sep( Di, D2) 


|| Ez |le- 
See also Corollary 7.2.5. 
Proof. It can be shown using the SVD that 
|| P+ PT P)-V? |; < ||P lle. (8.1.3) 
Since QT Qi = P(I + PF P)-!/? it follows that 
dist(ran(Q1), ran(Q1)) I Q£Qi la = || P+ PF P)! |; 


I 


I^ 


| P llz < || En llz/sep(Di, D2). © 


Thus, the reciprocal of sep( Di, D2) can be thought of as a condition number 
that measures the sensitivity of ran(Qi) as an invariant subspace. 

The effect of perturbations on a single eigenvector is sufficiently impor- 
tant that we specialize the above results to this important case. 


Theorem 8.1.12 Suppose A and A+ E are n-by-n symmetric matrices 


and that 
Q-[m Q&Q] 
1 n-1 


is an orthogonal matriz such that qı is an eigenvector for A. Partition the 
matrices QT AQ and QT EQ as follows: 


à 0 1 e eT 1 
T = T = 
Q'AQ = lo De | naa QTEQ = HUE 
1n-—1 1n-—1 
Ifd= min |A-p| > 0 and 
HEX(D2) 
d 
«- 
| Ela S 7 


then there exists p € IR^^! satisfying 


4 
Iple < $lel 
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such that à = (qı +Qep)//1 + pTp is a unit 2-norm eigenvector for A+ E. 
Moreover, 


. . - 4 
dist(span{qi},span{q@,}) = yi- (ata)? < d l| e lle. 


See also Corollary 7.2.6. 


Proof. Apply Theorem 8.1.10 and Corollary 8.1.11 with r — 1 and observe 
that if Dj = (A), then d = sep(Di, D2). O 


Example 8.1.6 If A = diag(.999, 1.001, 2.), and 
0.00 0.01 0.01 
E = | 0.01 000 001 |, 
0.01 0.01 0.00 


then ÔT (A + E)Q = diag(.9899, 1.0098, 2.0002) where 


-6708 .TA17  .0101 
.0007  —.0143  .9999 


. —.7418 .6706 .0101 
Q = 


is orthogonal. Let ĝ; = Qei, i = 1,2,3. Thus, ĝ; is the perturbation of A's eigenvector 
qi = ei. A calculation shows that 


dist(span(gi)],span(d1)) = dist{span{q2},span{q2}} = .67 


Thus, because they are associated with nearby eigenvalues, the eigenvectors qı and q2 
cannot be computed accurately. On the other hand, since A; and A2 are well separated 
from As, they define a two-dimensional subspace that is not particularly sensitive as 
dist (span(q1, q2), span{qi, G2}} = .01. 


8.1.4 Approximate Invariant Subspaces 


If the columns of Q, € IR*** are independent and the residual matriz R = 
AQ, — Qı S is small for some S € IR"**, then the columns of Q, define an 
approximate invariant subspace. Let us discover what we can say about 
the eigensystem of A when in the possession of such a matrix. 


Theorem 8.1.13 Suppose A € IR"*" and S € IR'** are symmetric and 
that 


AQ1-QIS = Ei 


where Qı € R"™" satisfies QTQ = Ip. Then there exist ui,..., Uy € (A) 
such that 
lux —An(S)| € v2 || E lle 


fork — lr. 
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Proof. Let Q: € R®*‘"-”) be any matrix such that Q = [ Qi, Qe ] is 
orthogonal. It follows that 


r S 0 QTE, EQ: 
QTAQ = + = B+E 
0 QzZAQ2 QTE, 0 


and so by using Corollary 8.1.6 we have |A&(A) — A&(B)| < || Ella for 
k = 1:n. Since A(S) C A(B), there exist 1,..., Hr € A(A) such that 


lak — A«(S)| < | E lla 


for k = l:r. The theorem follows by noting that for any x € IR^ and 
y € IR^" we have 


T 
[| y J| < | Fiz lla - || ET Qey lla € || Ex lle || x lle + || Æ lle Il y lle 
2 


from which we readily conclude that || E || € V2|| E1 ||. 9 


Example 8.1.7 If 
-[9 2]. a. -[ Mt), ands = Gne 
then 
AQ -Qi$ = | —.0562 | = Bn 
The theorem predicts that A has an eigenvalue within 4/2 || E1 ||? 7: .1415 of 5.1. This 


is true since A(A) = (5, 10}. 


The eigenvalue bounds in Theorem 8.1.13 depend on || AQ, — Q5 ||a. 
Given A and Qj, the following theorem indicates how to choose S so that 
this quantity is minimized in the Frobenius norm. 


Theorem 8.1.14 If A € R"*” is symmetric and Q, € IR"*" has orthonor- 
mal columns, then 


min [AQi-QiS|p = l(17- Q191)4Q1 Ig 


SCR'"*" 


and S — QT AQ, is the minimizer. 


Proof. Let Qz € IR^* (^7? be such that Q = [ Qi, Qz2] is orthogonal. For 
any S € IR** we have 


| QTAQ: - Q7Q, S | 
| QT AQ: - S ll? + || QT AQ, IŻ. 


I AQ: — Q18 || 


i 


i 
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Clearly, the minimizing S is given by S = QT AQ). B 


This result enables us to associate any r-dimensional subspace ran(Qi), 
with a set of r “optimal” eigenvalue-eigenvector approximates. 


Theorem 8.1.15 Suppose A € IR"*^ is symmetric and that Qı € IR?*" 
satisfies QTQ, =I,. If 


ZT(QT AQ))Z = diag(,...,0-) = D 
is the Schur decomposition of QT AQ, and Q\Z =([y1,---,Yr] , then 
|| Ayr — Oxye la = || (1 - Q1QD AQiZeslla < IO - Q191)AQ lle 
fork 2 lr. 
Proof. 
Ayk — 94k = AQiZex = QuZDey = (AQi - Qi(QTAQi))Zex. 


The theorem follows by taking norms. O 


In Theorem 8.1.15, the 0, are called Ritz values, the yy are called Ritz 
vectors, and the (0k, yx) are called Ritz pairs. 

The usefulness of Theorem 8.1.13 is enhanced if we weaken the assump- 
tion that the columns of Q, are orthonormal. As can be expected, the 
bounds deteriorate with the loss of orthogonality. 


Theorem 8.1.16 Suppose A € R"*” is symmetric and that 
AX, -XS = F, 
where X, € R°“" and S = XT AX. If 
XIX -I lla =7 <1, (8.1.4) 
then there exist ui, ..., up € A(A) such that 
lux —Ak(S)| € V2(I Fila + 7(2- 7)I A lla) 
fork 2 lr. 


Proof. Let Xj = ZP be the polar decomposition of X). Recall from 
84.2.10 that this means Z € IR?*^ has orthonormal columns and P € IR*** 
is a symmetric positive semidefinite matrix that satisfies P? = XTX. 
Taking norms in the equation 


(AX — X18) + A(Z- Xi) - (Z- X))S 
F| + AZ(I - P) — Z(I - P)XT Ax, 
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gives 
| Filla € || Fille + LATIS 1I — P l2 (1 1X3 l). (8.1.5) 
Equation (8.1.4) implies that 
lX l} «1-7. (8.1.6) 
Since P is positive semidefinite, (I + P) is nonsingular and so 
I-P-(I-Py'1-P»h-(I-P)!- Xf) 


which implies | J — P ||; < 7. By substituting this inequality and (8.1.6) 
into (8.1.5) we have || E |la € || Fi lle + 7(2  7)| Alle The proof is 


completed by noting that we can use Theorem 8.1.13 with Qj = Z to 
relate the eigenvalues of A and S via the residual E,. O 


8.1.5 The Law of Inertia 


The inertia of a symmetric matrix A is a triplet of nonnegative integers 
(m, z, p) where m, z, and p are respectively the number of negative, zero, 
and positive elements of (A). 


Theorem 8.1.17 (Sylvester Law of Inertia) If A € IR^*^ is symmet- 
ric and X € IR?** is nonsingular, then A and XT AX have the same iner- 
tia. 
Proof. Suppose for some r that A,.(A) > 0 and define the subspace Sp C 
R?” by 

So = span(X tq... X^! q.), qı #0 


where Aq; = \;{A)q; and i = 1:r. From the minimax characterization of 
A-(XT AX) we have 


T T T T 
A«(XT AX) — max min y QC AXW > min y QC AXW 
dim(S)-r yES y^y y€ So yTy 
Since 
T(XTX 
y € R” => LON > On(X)? 
T(XTAX 
yeSy > UY > A(A) 


it follows that 


A.XTAX) > min 


{ro yy yT (XT X)y 
yESo 


yT(XTX)y yTy \ 2 A«(A)os (X)?. 
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An analogous argument with the roles of A and XT AX reversed shows that 


AL(XT AX) 


A(A) > (XT AX) (CT! = "nns 


Thus, A«(A) and A.(XT AX) have the same sign and so we have shown that 
A and XT AX have the same number of positive eigenvalues. If we apply 
this result to — A, we conclude that A and XT AX have the same number of 
negative eigenvalues. Obviously, the number of zero eigenvalues possessed 
by each matrix is also the same. O 


Example 8.1.8 If A = diag(3,2, — 1) and 


14 5 
X=/01 2], 
0 0 1 
then 
3 12 15 
XTAX = | 12 50 64 
15 64 82 


and A(XT AX) = (134.769, .3555, —.1252). 


Problems 


P8.1.1 Without using any of the results in this section, show that the eigenvalues of a 
2-by-2 symmetric matrix must be real. 


P8.1.2 Compute the Schur decomposition of A = | 1 : | . 


P8.1.3 Show that the eigenvalues of a Hermitian matrix (A? = A) are real. For 
each theorem and corollary in this section, state and prove the corresponding result for 
Hermitian matrices. Which results have analogs when A is skew-symmetric? (Hint: If 
AT = —A, then iA is Hermitian.) 


P8.1.4 Show that if X c R?*", r <n, and || XTX — I | = 7 < 1, then omin(X) > 1-1. 


P8.1.5 Suppose A, E c R"*" are symmetric and consider the Schur decomposition 
A+tE = QDQT where we assume that Q = Q(t) and D = D(t) are continuously differ- 
entiable functions of t € R. Show that D(t) = diag(Q(t)T EQ(t)) where the matrix on 
the right is the diagonal part of Q(t)T EQ(t). Establish the Wielandt-Hoffman theorem 
by integrating both sides of this equation from 0 to 1 and taking Frobenius norms to 
show that 


1 
I D) - D(0) lp < n l| diag(Q(t)7EQ(t) pdt < || Elly. 
0 


P8.1.6 Prove Theorem 8.1.5. 
P8.1.7 Prove Theorem 8.1.7. 


P8.1.8 If C € R**" then the trace function tr(C) = c11 + --- + cnn equals the sum of 
C's eigenvalues. Use this to prove Theorem 8.1.8. 


P8.1.9 Show that if B € R”*™ and C € R"™” are symmetric, then sep(B, C) = min 
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|| BX - XC ||p where the min is taken over all matrices in R'^*". 
P8.1.10 Prove the inequality (8.1.3). 


P8.1.11 Suppose A € R”*” is symmetric and C c R”*" has full column rank and 
assume that r « n. By using Theorem 8.1.8 relate the eigenvalues of A + CCT to the 
eigenvalues of A. 


Notes and References for Sec. 8.1 


The perturbation theory for the symmetric eigenvalue problem is surveyed in Wilkinson 
(1965, chapter 2), Parlett (1980, chapters 10 and 11), and Stewart and Sun (1990, chap- 
ters 4 and 5). Some representative papers in this well-researched area include 


G.W. Stewart (1973). “Error and Perturbation Bounds for Subspaces Associated with 
Certain Eigenvalue Problems,” SIAM Review 15, 727-64. 

C.C. Paige (1974). “Eigenvalues of Perturbed Hermitian Matrices,” Lin. Alg. and Its 
Applic . 8, 1-10. 

A. Ruhe (1975). “On the Closeness of Eigenvalues and Singular Values for Almost 
Normal Matrices," Lin. Alg. and Its Applic. 11, 87—94. 

W. Kahan (1975). “Spectra of Nearly Hermitian Matrices,” Proc. Amer. Math. Soc. 
48, 11-17. 

A. Schonhage (1979). “Arbitrary Perturbations of Hermitian Matrices,” Lin. Alg. and 
Its Applic. 24, 143-49. 

P. Deift, T. Nanda, and C. Tomei (1983). “Ordinary Differential Equations and the 
Symmetric Eigenvalue Problem,” SIAM J. Numer. Anal. 20, 1-22. 

D.S. Scott (1985). “On the Accuracy of the Gershgorin Circle Theorem for Bounding 
the Spread of a Real Symmetric Matrix,” Lin. Alg. and Its Applic. 65, 147-155 
J.-G. Sun (1995). “A Note on Backward Error Perturbations for the Hermitian Eigen- 

value Problem,” BIT 35, 385-393. 

R.-C. Li (1996). “Relative Perturbation Theory (I) Eigenvalue and Singular Value Vari- 
ations,” Technical Report UCB//CSD-94-855, Department of EECS, University of 
California at Berkeley. 

R.-C. Li (1996). “Relative Perturbation Theory (II) Eigenspace and Singular Subspace 
Variations,” Technical Report UCB//CSD-94-856, Department of EECS, University 
of California at Berkeley. 


8.2 Power Iterations 


Assume that A € IR?*^ is symmetric and that Ug € IR?*^ is orthogonal. 
Consider the following QR iteration: 


To = Ud AUo 

for k = 1,2,... 
Te-1 = U,R, (QR factorization) (8.2.1) 
Tk = RU, 

end 


Since Ty, = RU, = UF (U, R,)Uk = ULT, AU, it follows by induction 
that 
Ty = (UqUi +- Ux)! A(UoU - - - Up). (8.2.2) 
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Thus, each T, is orthogonally similar to A. Moreover, the T; almost al- 
ways converge to diagonal form and so it can be said that (8.2.1) almost 
always “converges” to a Schur decomposition of A. In order to establish 
this remarkable result we first consider the power method and the method 
of orthogonal iteration. 


8.2.1 The Power Method 


Given a unit 2-norm q(? € IR^, the power method produces a sequence of 
vectors q( as follows: 


for k = 1,2,... 
z) = Aq(k- 0 
q(9 = 209 || 209 |l; (8.2.3) 


AQ = [g (57 Aq 
end 


If q(9 is not “deficient” and A's eigenvalue of maximum modulus is unique, 
then the q(? converge to an eigenvector. 


Theorem 8.2.1 Suppose A € IR"*" is symmetric and that 
QT AQ = diag(A,...,À4) 
where Q = [q1,...,q4] is orthogonal and |Ai| > |A2| > ++ > |An|. Let the 
vectors qx be specified by (8.2.3) and define 0, € [0, 7/2] by 
cos(6,) = lera? | . 


If cos(09) 4 0, then 

k 
A2 
x (8.2.4) 


[sin(6&)| X tan(69) 


2k 
A2 


|l? — A] « JA, —An|tan(O)? x 
1 


(8.2.5) 


Proof. From the definition of the iteration, it follows that q*) is a multiple 
of A*q( and so 


2 
. 2 q7 A*q(9 
lsin(&)? = 1 — (ara'9) 21— (EROE l 
( ' ) || A¥g® (5 


If q( has the eigenvector expansion q = argi + --- + QnQn, then 


lai] = lar q| = cos(4) # 0, 
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ateta], 


and 
Akg = ajMqi + a2Mgo + -> + nA ‘Gn - 
Thus, 
n 
ae et 
lsin(@)|> = 1- = = = — 


Yes Dat 


i=1 t=1 


I 
ect 
E 
— 
S 
—— 
t 
ATN 
FIS 
Ner 
E 


This proves (8.2.4). Likewise, 


242k-41 


T 
A = [a] Aq = Le 2 - 
(0]7 A2kq(0) n 
[a ] q Xo aa 
i=1 
and so 
3 af AF (As — A) 12 
I9 — X = |Æ < Di —Anl rg Do at (s 
So a2 a? i—2 
t=1 


2k 
|A1 — An| tan(69)? (3) .g 
1 


IA 


Example 8.2.1 The eigenvalues of 
—1.6407 1.0814 1.2014 1.1539 
A- 1.0814 4.1573 7.4035 | —1.0463 
1.2014 7.4035 2.7890  —1.5737 
1.1539  —1.0463  —1.5737 8.6944 


407 
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are given by A(A) = (12,8, 4, 2). If (8.2.3) is applied to this matrix with q® = 
[1 0 0 07, then 


ACF) 

2.3156 
8.6802 
10.3163 
11.0663 
11.5259 
11.7747 
11.8967 
11.9534 
11.9792 
11.9907 


m= 
DONDAN | 


Observe the convergence to À1 = 12 with rate {A2/Ai|2* = (8/12)? = (4/9)*. 


Computable error bounds for the power method can be obtained by using 
Theorem 8.1.13. If 
|| Ag) — A q® ||, = 6, 


then there exists A € A( A) such that | — A| € 26. 


8.2.2 Inverse Iteration 


Suppose the power method is applied with A replaced by (A — MIY}. If A 
is very close to a distinct eigenvalue of A, then the next iterate vector will 
be very rich in the corresponding eigendirection: 


n 
r= 1 aiqi 
i=1 


A-dD— = V LS a. 
=> (A-Al) x > yai 
Aqi = AiGiy i= Lin 


Thus, if A = A; and a; is not too small, then this vector has a strong 
component in the direction of q;. This process is called inverse iteration 
and it requires the solution of a linear system with matrix of coefficients 
A — M. 


8.2.8 Rayleigh Quotient Iteration 


Suppose A € IR"*" is symmetric and that z is a given nonzero n-vector. A 
simple differentiation reveals that 


zT Az 
À = r(x) = ar 
minimizes || (A — AJ)z ||2. (See also Theorem 8.1.14.) The scalar r(z) is 
called the Rayleigh quotient of x. Clearly, if z is an approximate eigen- 


vector, then r(x) is a reasonable choice for the corresponding eigenvalue. 
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Combining this idea with inverse iteration gives rise to the Rayleigh quotient 
iteration: 


zo given, || zo ||; = 1 

for k =0,1,... 
Hk = r(zk) (8.2.6) 
Solve (A — pal) ze41 = £k for Zk41 


Tk+1 = Zkgi/ || 241 lle 
end 


Example 8.2.2 If (8.2.6) is applied to 


11 1 1 1. 1 
12 3 4 5 6 
A-|183 6 10 1 2 
=~ [14 10 20 35 56 
1 5 15 35 70 126 

1 6 21 56 126 252 


with zo = [1, 1, 1, 1, 1, 1]7/6, then 


Uk 
153.8333 
120.0571 

49.5011 
13.8687 
15.4959 
15.5534 


of WONF Ole 


The iteration is converging to the eigenvalue A = 15.5534732737. 


The Rayleigh quotient iteration almost always converges and when it 
does, the rate of convergence is cubic. We demonstrate this for the case 
n = 2. Without loss of generality, we may assume that A = diag(A1, A2), 
with Aj > A2. Denoting ry by 


_ Ck 2 2. 
Ik — | Sk | Ck + Sk = 1 
it follows that u& = Aic? + A157 in (8.2.6) and 
ne = C/ sk 
MU x -2A | -s/e | 


À calculation shows that 


(8.2.7) 


From these equations it is clear that the x, converge cubically to either 
span{e,} or span{e2} provided |c;| Z |sx]. 

Details associated with the practical implementation of the Rayleigh 
quotient iteration may be found in Parlett (1974). 
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8.2.4 Orthogonal Iteration 


A straightforward generalization of the power method can be used to com- 
pute higher-dimensional invariant subspaces. Let r be a chosen integer 
satisfying 1 < r < n. Given an n-by-r matrix Qo with orthonormal 
columns, the method of orthogonal iteration generates a sequence of matri- 
ces {Qk} C IR?** as follows: 


for k = 1,2,... 
Zk = AQk-1 (8.2.8) 
Q.R, = Zk (QR factorization) 

end 


Note that if r = 1, then this is just the power method. Moreover, the 
sequence (Qxei } is precisely the sequence of vectors produced by the power 
iteration with starting vector q = Qoe. 

In order to analyze the behavior of (8.2.8), assume that 


QTAQ = D = dig) — ul > Dal 2 > [nl (8.2.9) 


is a Schur decomposition of A € IR?*?, Partition Q and D as follows: 


D) 0 T 
Q=[Q. Qs] D- |? 5. | n—r (82.0) 
T o n-r 


T n-r 


If |A.| > |Ar+i|, then 
D,(A) = ran(Qa) 


is the dominant invariant subspace of dimension r. It is the unique invari- 
ant subspace associated with the eigenvalues A1,..., Ar. 

The following theorem shows that with reasonable assumptions, the 
subspaces ran(Q;) generated by (8.2.8) converge to D,(A) at a rate pro- 
portional to |A«,1/A..|*. 


Theorem 8.2.2 Let the Schur decomposition of A € IR"*" be given by 
(8.2.9) and (8.2.10) with n > 2. Assume that |A.| > |Ar41| and that the 
n-by-r matrices {Qk} are defined by (8.2.8). If 6 € [0,m/2] is specified by 


cos(0) = MELLE > 0, 
uep,(a) Il lell v lle 
veran(Qo) 

then " 
dist(D,.(A),ran(Q,)) < tan(6) Act 


See also Theorem 7.3.1. 
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Proof. By induction it can be shown that 


A*Qo = Qe (Ry +++ Ri) 
and so with the partitionings (8.2.10) we have 


[2 or || ot | = | det | eR). 


0 Q5 Qo QQ 
If QTQ V 
Ta _ Ta _ aQk | _ k 
Fo = Ra o^. = dr | = | oh |. 
then 
cos(Omin) = or(Vo) = y1- || Wo ll? 
dist(D,(A), ran(Qx)) = || We lle 
Di = Vi(Re--- Ri) 


DEW = Wz (Re--- Ry) 


It follows that Vp is nonsingular which in turn implies that V, and (Ry --- 


are also nonsingular. Thus, 


We = DEW (Rk RY = DEW, (V DEW) | 
= DWV, DI V 
and so 
| We ll2 < D$ liz ll Wo lla ll Vo? liz I Dy* lla Il Ve lle 
k 
^n 


IA 


= tan(@) .H 


1 1 
koc _ 
leal sin(@) cos(0) DL 


Àr 
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Ri) 


Example 8.2.3 If (8.2.8) is applied to the matrix of Example 8.2.1 with r — 2 and 


Qo = I4(:, 1:2), then 


dist (D2(A),ran(Qx)) 
0.8806 
0.4091 
0.1121 
0.0313 
0.0106 
0.0044 
0.0020 
0.0010 
0.0005 
0.0002 


COWOONAO WN Sle 
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8.2.5 The QR Iteration 


Consider what happens when we apply the method of orthogonal iteration 
(8.2.8) with r = n. Let QT AQ = diag(A,,..., An) be the Schur decompo- 
sition and assume 


Il > [A2] > > Aul. 


If Q=[q1,-..,4n] and Qk = [a9,.... ] and 


dist(D;(A), span{q®, ees ,q9)) « 1 (8.2.11) 


) 


for i = 1:n — 1, then it follows from Theorem 8.2.2 that 


Nit] 
Ài 


dist(span{g™®,... , a ),span(a,...,q)) = ( 


for i = 1:n — 1. This implies that the matrices Tẹ defined by 
Tk = QLAQk 


are converging to diagonal form. Thus, it can be said that the method 
of orthogonal iteration computes a Schur decomposition if r = n and the 
original iterate Qo € IR"”” is not deficient in the sense of (8.2.11). 

The QR iteration arises by considering how to compute the matrix TX 
directly from its predecessor T,_1. On the one hand, we have from (8.2.1) 
and the definition of T&..1 that 


Th-1 = QE ,AQk 1 = QE (4Qr-1) = (QE. ,QX) Re- 


On the other hand, 


T, = QE AQk = (QE AQx 1) (Qz .Qx) = Rk(Qk 1Qx). 


Thus, T; is determined by computing the QR factorization of T&. , and 
then multiplying the factors together in reverse order. This is precisely 
what is done in (8.2.1). 


Example 8.2.4 If the QR iteration (8.2.1) is applied to the matrix in Example 8.2.1, 
then after 10 iterations 


11.9907  —0.1926  —0.0004 0.0000 
—0.1926 8.0093  —0.0029 0.0001 
—0.0004  —0.0029  -—4.0000 0.0007 

0.0000 0.0001 0.0007 -2.0000 


Tio — 


The off-diagonal entries of the Tj, matrices go to zero as follows: 
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k | ITR D] TRDI Tk DI |T&(3,2].— 74,2)! Ze (4,3) 
1 3.9254 1.8122 3.3892 4.2492 2.8367 1.1679 
2| 26491 1.2841 2.1908 1.1587 3.1473 0.2294 
3| 20147 0.6154 0.5082 0.0997 0.9859 0.0748 
4 1.6930 0.2408 0.0970 0.0723 0.2596 0.0440 
5 1.2928 0.0866 0.0173 0.0665 0.0667 0.0233 
6| 0.9222 0.0299 0.0030 ^ 0.0405 0.0169 0.0118 
7| 06346 0.0101 0.0005 0.0219 0.0043 0.0059 
8| 0.4292 0.0034 0.0001 0.0113 0.0011 0.0030 
9| 0.2880 0.0011 (0.0000 0.0057 0.0003 0.0015 

10 | 0.1926 — 0.0000 . 0.0000 0.0029 0.0001 0.0007 


Note that a single QR iteration involves O(n?) flops. Moreover, since con- 
vergence is only linear (when it exists), it is clear that the method is a pro- 
hibitively expensive way to compute Schur decompositions. Fortunately, 
these practical difficulties can be overcome as we show in the next section. 


Problems 


P8.2.1 Suppose Ao € R"*" is symmetric and positive definite and consider the following 
iteration: 


for k = 1,2,... 
Ag-i1 = GGT (Cholesky) 
Ak = GIG, 

end 


(a) Show that this iteration is defined. (b) Show that if Ao = ? | with a > c has 


a 
b 
eigenvalues A1 > A2 > 0, then the A; converge to diag(A1, A2). 
P8.2.2 Prove (8.2.7). 


P8.2.3 Suppose A € R?*" is symmetric and define the function f:R"t+! — R"+! by 


(3) Lea | 


where r € R” and A € R. Suppose z+ and A, are produced by applying Newton's 
method to f at the “current point” defined by ze and Ac. Give expressions for r+ and 
Ay assuming that || zo ||2 = 1 and Ac = zT Aze. 


Notes and References for Sec. 8.2 


The following references are concerned with the method of orthogonal iteration (a.k.a. 
the method of simultaneous iteration): 


G.W. Stewart (1969). “Accelerating The Orthogonal Iteration for the Eigenvalues of a 
Hermitian Matrix,” Numer. Math. 13, 362-76. 

M. Clint and A. Jennings (1970). “The Evaluation of Eigenvalues and Eigenvectors of 
Real Symmetric Matrices by Simultaneous Iteration,” Comp. J. 13, 76-80. 

H. Rutishauser (1970). “Simultaneous Iteration Method for Symmetric Matrices,” Nu- 
mer. Math. 16, 205-23. See also Wilkinson and Reinsch (1971,pp.284-302). 
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References for the Rayleigh quotient method include 


J. Vandergraft (1971). “Generalized Rayleigh Methods with Applications to Finding 
Eigenvalues of Large Matrices,” Lin. Alg. and Its Applic. 4, 353-68. 

B.N. Parlett (1974). “The Rayleigh Quotient Iteration and Some Generalizations for 
Nonnormal Matrices,” Math. Comp. 28, 679-93. 

R.A. Tapia and D.L. Whitley (1988). “The Projected Newton Method Has Order 1+ V2 
for the Symmetric Eigenvalue Problem,” SIAM J. Num. Anal. 25, 1376-1382. 

S. Batterson and J. Smillie (1989). “The Dynamics of Rayleigh Quotient Iteration,” 
SIAM J. Num. Anal. 26, 624-636. 

C. Beattie and D.W. Fox (1989). “Localization Criteria and Containment for Rayleigh 
Quotient Iteration,” SIAM J. Matriz Anal. Appl. 10, 80-93. 

P.T.P. Tang (1994). “Dynamic Condition Estimation and Rayleigh-Ritz Approxima- 
tion,” SIAM J. Matriz Anal. Appl. 15, 331—346. 


8.3 The Symmetric QR Algorithm 


The symmetric QR iteration (8.2.1) can be made very efficient in two ways. 
First, we show how to compute an orthogonal Uy such that UT AU = T is 
tridiagonal. With this reduction, the iterates produced by (8.2.1) are all 
tridiagonal and this reduces the work per step to O(n”). Second, the idea of 
shifts are introduced and with this change the convergence to diagonal form 
proceeds at a cubic rate. This is far better than having the off-diagonal 
entries going to to zero like |A;,1/A;|* as discussed in §8.2.5. 


8.3.1 Reduction to Tridiagonal Form 


If A is symmetric, then it is possible to find an orthogonal Q such that 
QTAQ =T (8.3.1) 


is tridiagonal. We call this the tridiagonal decomposition and as a compres- 
sion of data, it represents a very big step towards diagonalization. 

We show how to compute (8.3.1) with Householder matrices. Suppose 
that Householder matrices P,,..., Pk-ı have been determined such that if 
Akı = (Pi - ++ Phy)? A(Ph +++ Pk-1), then 


By Bi 0 k-1 
A _ Bi Ba Bog 1 
k-1 = 0 B32 B33 n—k 


k-1 1 n—k 


is tridiagonal through its first k — 1 columns. If P, is an order n — k 
Householder matrix such that P,B32 is a multiple of I4. 4, (:, 1) and if PX = 
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diag( Ix, Px), then the leading k-by-k principal submatrix of 


By By 0 k-1 
Ak = PrAp-i Pe = B2 Baa Bas Py. 1 
0 P, B32 P, Bas B n-k 
k-1 1 n-k 


is tridiagonal. Clearly, if Uy = P; --- P,_2, then UT AU, = T is tridiagonal. 
In the calculation of A; it is important to exploit symmetry during the 
formation of the matrix P,B33 P. To be specific, suppose that P, has the 
form - 
B =I-BuT  g-2/|v'w 0FvER™. 
Note that if p = 6B33v and w = p — (pTv/2)v, then 
P; Bas P = B33 — vut — wor. 


Since only the upper triangular portion of this matrix needs to be calcu- 
lated, we see that the transition from A,_, to A, can be accomplished in 
only 4(n — k)? flops. 


Algorithm 8.3.1 (Householder Tridiagonalization) Given a sym- 
metric A € IR?*^, the following algorithm overwrites A with T = QT AQ, 
where T is tridiagonal and Q = H,--- H4 2 is the product of Householder 
transformations. 


for k 2 1:n-—2 
[v, 8] = house(A(k + 1:n, k)) 
p= BA(k + 1n, k + 1:n)v 
u-p- (pT v/2)v 
A(k 4- 1,k) = || A(k + 1:n, k) lla; A(k, k + 1) = A(k 4 1.k) 
A(k + 1:n, k + 1:n) = A(k + lin, k + 1:n) — vwT — wot 
end 


This algorithm requires 4n?/3 flops when symmetry is exploited in calcu- 
lating the rank-2 update. The matrix Q can be stored in factored form in 
the subdiagonal portion of A. If Q is explicitly required, then it can be 
formed with an additional 4n?/3 flops. 

Example 8.3.1 


10 0]? 10 0 1 5 0 
0 6 8 0.6 .8|^-51|5 1032 176 |. 
0 8 -6 0 .8 -—6 0 1.76 -—532 


Note that if T has a zero subdiagonal, then the eigenproblem splits into 
a pair of smaller eigenproblems. In particular, if ¢,41, = 0, then A(T) = 
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MT (1:k, 1:k)) UA(T (k +. 1:n, k+1:n)). If T has no zero subdiagonal entries, 
then it is said to be unreduced. 

Let T denote the computed version of T obtained by Algorithm 8.3.1. 
It can be shown that T= QT(A+ E)Q where Q is exactly orthogonal and 
E is a symmetric matrix satisfying || E ||, < cull Al||p where c is a small 
constant. See Wilkinson (1965, p. 297). 


8.3.2 Properties of the Tridiagonal Decomposition 


We prove two theorems about the tridiagonal decomposition both of which 
have key roles to play in the sequel. The first connects (8.3.1) to the QR 
factorization of a certain Krylov matriz. These matrices have the form 


K(A,v,k) = [v, Av,---, A*^ v] AER”, ve R". 


Theorem 8.3.1 If QT AQ = T is the tridiagonal decomposition of the sym- 
metric matriz A € IR^*^, then QT K(A,Q(:,1),n) = R is upper triangular. 
If R is nonsingular, then T is unreduced. If R is singular and k is the 
smallest index so rx, = 0, then k is also the smallest index so tk k-1 is 
zero. See also Theorem 7.4.3. 


Proof. It is clear that if gq; = Q(:, 1), then 


QT K(A,Q(; 1),n) [QTa. (QT AQ)(QTa)..... (QT AQ)" "(Q7 a1) ] 
= [e Tey,..., T^^ le, | = R 


is upper triangular with the property that rj, = 1 and ri; = taitge---tii-1 
for i = 2n. Clearly, if R is nonsingular, then T is unreduced. If R is 
singular and rj is its first zero diagonal entry, then k > 2 and tk. is the 
first zero subdiagonal entry. O 


The next result shows that Q is essentially unique once Q(:, 1) is specified. 


Theorem 8.3.2 ( Implicit Q Theorem) Suppose Q = [qi,...,q54] and 
V = [vi,..., v4] are orthogonal matrices with the property that both QT AQ 
=T and VT AV = S are tridiagonal where A € R"™” is symmetric. Let k 
denote the smallest positive integer for which tk 41, = 0, with the conven- 
tion that k = n if T is unreduced. If vı = qi, then v; = +q; and |tii 1| = 
|sii-1| for i = 2:k. Moreover, if k <n, then sp41,k — 0. See also Theorem 
74.2. 


Proof. Define the orthogonal matrix W = QTV and observe that W(:, 1) = 
In(:,1) = e1 and WTTW = S. By Theorem 8.3.1, WT K(T, e1, k) is upper 
triangular with full column rank. But K(1' ej, k) is upper triangular and 
So by the essential uniqueness of the thin QR factorization, 


W(:, 1:k) = In(:, 1:k)diag(+1,...,+1). 
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This says that Q(:,i) = +V(:,i) for i = 1:k. The comments about the 
subdiagonal entries follows from this since ti41,, = Q(:,2 + 1)? AQ(:, i) and 
Signs = V(5i-- 1)  AV(,i) for i = 1:n — 1.0 


8.3.3 The QR Iteration and Tridiagonal Matrices 


We quickly state four facts that pertain to the QR iteration and tridiagonal 
matrices. Complete verifications are straight forward. 


1. Preservation of Form. If T = QR is the QR factorization of a sym- 
metric tridiagonal matrix T € IR?*^, then Q has lower bandwidth 1 
and R has upper bandwidth 2 and it follows that 


T, = RQ = Q'(QR)Q = Q"TQ 
is also symmetric and tridiagonal. 
2. Shifts. If s € IR and T — sI = QR is the QR factorization, then 
T, = RQ+sI =Q'TQ 
is also tridiagonal. This is called a shifted QR step. 


3. Perfect Shifts. If T is unreduced, then the first n — 1 columns of T — sI 
are independent regardless of s. Thus, if s € A(T) and 


QR — T — sI 


is a QR factorization, then Tann = 0 and the last column of T} = 
RQ + sI equals sI,(:, n) = sen. 


4. Cost. If T € IR**" is tridiagonal, then its QR factorization can be 
computed by applying a sequence of n — 1 Givens rotations: 


for k= 1:n—1 
[c, s] = givens(tkk, te+1,k) 
m = min(k + 2,n) 

Cc s 


T 
$c | T(k:k +1, k:m) 


T(k:k + 1,k:m) = | _ 


end 


This requires O(n) flops. If the rotations are accumulated, then O(n?) 
flops are needed. 
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8.3.4 Explicit Single Shift QR Iteration 


If s is a good approximate eigenvalue, then we suspect that the (n,n — 1) 
will be small after a QR step with shift s. This is the philosophy behind 
the following iteration: 


T = Ul AU, (tridiagonal) 


for k =0,1,... 
Determine real shift x. (8.3.2) 
T—-yul = UR (QR factorization) 
T — RU +p 
end 
If 
a, bi 0 
b; a 
T = 
. ba 
0 wee bn-1 ün 


then one reasonable choice for the shift is 4 = a4. However, a more effective 
choice is to shift by the eigenvalue of 


T(n—1lm,n-1lm) = | v bn—1 | 


n-1 an 
that is closer to a,. This is known as the Wilkinson shift and it is given 
by 
B = an +d —sign(d),/d? + 52 , (8.3.3) 
where d = (an-ı — a4)/2. Wilkinson (1968b) has shown that (8.3.2) is 


cubically convergent with either shift strategy, but gives heuristic reasons 
why (8.3.3) is preferred. 


8.3.5 Implicit Shift Version 


It is possible to execute the transition from T to T} = RU + pI = UTTU 
without explicitly forming the matrix T — „I. This has advantages when 
the shift is much larger than some of the a;. Let c = cos(0) and s = sin(0) 
be computed such that 
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If we set Gi = G(1, 2,6) then Gye, = Ue, and 


+ 


T — G'TG, = 


ooo+t xx 
OOO XK XK X 
Ooo XK XX 
X XX OO 
X Xx xX OOO 
x Xoooo 


0 


We are thus in a position to apply the Implicit Q theorem provided we can 
compute rotations G2,...,G,_, with the property that if Z = G1G2 --- G4. 
then Ze; = Gie, = Ue, and ZTTZ is tridiagonal. 

Note that the first column of Z and U are identical provided we take 
each G; to be of the form G; = G(i,i+1,6;) , i = 2:n — 1. But G; of this 
form can be used to chase the unwanted nonzero element “+” out of the 
matrix GI TG, as follows: 


x x 0 0 0 0 x x 0 0 0 0 
x x x + 0 0 x x x 0 0 0 
Ga, 0 x x x 0 0 Gs QO x x x + 0 
0 + x x x O 0 0 x x x O0 
0 0 0 x x x 0 0 + x x x 
0 0 0 0 x x 0 0 0 0 x x 
x x 0 0 0 0 x x 0 0 0 0 
x x x 00 0 x x x 00 0 
G4 QO x x x 0 0 Gs 0 x x x O0 OQ 
0 0 x x x + 0 0 x x x O0 
0 0 0 x x x 0 0 0 x x x 
0 0 0 + x x 0 0 0 0 x x 


Thus, it follows from the Implicit Q theorem that the tridiagonal matrix 
ZTTZ produced by this zero-chasing technique is essentially the same as the 
tridiagonal matrix T obtained by the explicit method. (We may assume 
that all tridiagonal matrices in question are unreduced for otherwise the 
problem decouples.) 

Note that at any stage of the zero-chasing, there is only one nonzero 
entry outside the tridiagonal band. How this nonzero entry moves down 
the matrix during the update T' — GITG, is illustrated in the following: 


1 00 0 ak by zk 0 1 000 ak bh 0 0 
0 es 0 bk ap b, 0 0 cs 0| __|bk ap bp Zz 
0-sc 0 zk bp ag by |] 0 -s c O| | O b, ag b, 
0 00 1 0 0 bh a |{0 00 1 0 z ba. 


Here (p,q,r) = (k +1,k - 2, k +3). This update can be performed in about 
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26 flops once c and s have been determined from the equation bks + zyc = 
0. Overall we obtain 


Algorithm 8.3.2 (Implicit Symmetric QR Step with Wilkinson 
Shift) Given an unreduced symmetric tridiagonal matrix T € IR^*^, the 
following algorithm overwrites T with ZT TZ, where Z = G,---Gn_1 isa 
product of Givens rotations with the property that ZT(T — uI) is upper 
triangular and p is that eigenvalue of T’s trailing 2-by-2 principal submatrix 
closer to £44. 


d= (tn-1,n-1 — inn) /2 
Im ins — 22. a (a + sign(d),/d? -- t2, , ) 


r=ti-—p 
z= ta 
for k = l:n— 1 


[c,s] = givens(z, z) 
T -GITG,, where Gy = G(k,k +1,0) 


ifk<n-1 
T = tk+1,k 
z = teak 
end 


end 


This algorithm requires about 30n flops and n square roots. If a given 
orthogonal matrix Q is overwritten with QG,---Gn—, then an additional 
6n? flops are needed. Of course, in any practical implementation the tridi- 
agonal matrix T would be stored in a pair of n-vectors and not in an n-by-n 
array. 


Example 8.3.2 If Algorithm 8.3.2 is applied to 


1 1 0 0 
1 2 1 0 
T= 0 1 3 0?’ 
0 0 01 4 
then the new tridiagonal matrix T is given by 
.5000 .5916 0 0 
_ 5916 1.785 .1808 0 
~ 0 .1808 3.7140 .0000044 
0 0 .0000044 4.002497 


Algorithm 8.3.2 is the basis of the symmetric QR algorithm—the standard 
means for computing the Schur decomposition of a dense symmetric matrix. 
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Algorithm 8.3.3 (Symmetric QR Algorithm) Given A € R°*” (sym- 
metric) and a tolerance tol greater than the unit roundoff, this algorithm 
computes an approximate symmetric Schur decomposition QTAQ =D. A 
is overwritten with the tridiagonal decomposition. 


Use Algorithm 8.3.1, compute the tridiagonalization 
T =(P, P4 3)! A(Pı P4 3). 
Set D =T and if Q is desired, form Q = P,--- P, 3. See 85.1.6 
until q — n 
For i = 1:n — 1, set d, ; and di; ,1 to zero if 
idv] = dace < tol(lda]| + Idicical) 
Find the largest q and the smallest p such that if 


D= 0 D22 0 n—p-q 
0 0 D33 q 


then D33 is diagonal and D22 is unreduced. 
if q «n 
Apply Algorithm 8.3.2 to D22: 
D = diag(Ip, Z, Iq)? D diag(Ip, Z, Ig) 
If Q is desired, then Q = Q diag(Ip, Z, Iq). 
end 
end 


This algorithm requires about 4n?/3 flops if Q is not accumulated and 
about 9n? flops if Q is accumulated. 


Example 8.3.3 Suppose Algorithm 8.3.3 is applied to the tridiagonal matrix 


0 0 
4 0 
A= 5 6 
6 7 


1 2 
2 3 
0 4 
0 0 


The subdiagonal entries change as follows during the execution of Algorithm 8.3.3: 


Iteration a21 832 a43 
1 1.6817 3.2344 .8649 
2 1.6142 2.5755 .0006 
3 1.6245 1.6965 10-13 
4 1.6245 1.6965 converg. 
5 1.5117 .0150 
6 1.1195 10-9 
7 .7071 converg. 
8 converg. 
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Upon completion we find \(A) = (—2.4848, .7046, 4.9366, 12.831}. 


The computed eigenvalues Îi obtained via Algorithm 8.3.3 are the exact 
eigenvalues of a matrix that is near to A, i.e., QT (A + E)Qo = diag(À;) 
where QT Qo = I and || E la = ull A lla. Using Corollary 8.1.6 we know that 
the absolute error in each À; is small in the sense that |À; — A;| © ull A |l2. 
If Q- = [th,--.,4n] is the computed matrix of orthonormal eigenvectors, 
then the accuracy of d; depends on the separation of A; from the remainder 
of the spectrum. See Theorem 8.1.12. 

If all of the eigenvalues and a few of the eigenvectors are desired, then 
it is cheaper not to accumulate Q in Algorithm 8.3.3. Instead, the desired 
eigenvectors can be found via inverse iteration with T'. See $8.2.2. Usually 
just one step is sufficient to get a good eigenvector, even with a random 
initial vector. 

If just a few eigenvalues and eigenvectors are required, then the special 
techniques in $8.5 are appropriate. 

It is interesting to note the connection between Rayleigh quotient it- 
eration and the symmetric QR algorithm. Suppose we apply the latter 
to the tridiagonal matrix T c IR^*^ with shift o = ef Ten = tnn where 
en = I,(:,n). If T — oI- QR, then we obtain T = RQ + oI. From the 
equation (T — oI)Q = RT it follows that 


(T — ol)an =Tnnen, 
where qn is the last column of the orthogonal matrix Q. Thus, if we apply 
(8.2.6) with To = en, then Tı = qn- 
8.3.6 Orthogonal Iteration with Ritz Acceleration 


Recall from §8.2.4 that an orthogonal iteration step involves a matrix- 
matrix product and a QR factorization: 


Zk = AQk-1 
QkRk = Zk (QR factorization) 


Theorem 8.1.14 says that we can minimize || AQ, - ÕS ll» by setting S = 
Sk = QT AQ,. If UT S,U, = Dy is the Schur decomposition of 5, € IR" 
and Qk = Q&U,, then 


|| AQx - Q&Dx lp = || AQk - Qx Sk Il p 
showing that the columns of Qx are the best possible basis to take after k 


steps from the standpoint of minimizing the residual. This defines the Ritz 
acceleration idea: 
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Qo € IR?*? given with QF Qo = I, 
P 


for k =1,2,... 
Zk = AQk-ı 
QxHR& = Zk (QR factorization) 
Sy = QI AQk (8.3.6) 
Uf SkUk = Dk (Schur decomposition) 
Qk = Q&U, 
end 


It can be shown that if 
D, = diag(0(9,...,69) P] --- > 0 


then k 
Ar+1 


X i-lr 


|? - (4) = 0 


Recall that Theorem 8.2.2 says the eigenvalues of QT AQ, converge with 
rate |\,41/A,|*. Thus, the Ritz values converge at a more favorable rate. 
For details, see Stewart (1969). 


Example 8.3.4 If we apply (8.3.6) with 


100 1 1 1 1 0 
B 1 99 11 | ]0 1 
A= 1 1 2 1 and Qo- |, o 
1 1 11 0 0 


then 


k — dist{D2(A), Qk} 
0 .2 x 1071 
1 .5 x 1073 
2 1x 1074 
3 .3 x 10-6 
4 .8 x 10-8 


Clearly, convergence is taking place at the rate (2/99). 


Problems 


P8.3.1 Suppose A is an eigenvalue of a symmetric tridiagonal matrix T. Show that if 
à has algebraic multiplicity k, then at least k — 1 of T’s subdiagonal elements are zero. 
P8.3.2 Suppose A is symmetric and has bandwidth p. Show that if we perform the 
shifted QR step A — uI = QR, A = RQ + ul, then A has bandwidth p. 
P8.3.3 Suppose B € R"*” is upper bidiagonal with diagonal entries d(1:n) and super- 
diagonal entries f(1:n — 1). State and prove a singular value version of Theorem 8.3.1. 
wor 
z 


P8.3.4 Let A — be real and suppose we perform the following shifted QR 
T 


a g 
N HI 


step: A — zI = UR, A = RU + zI. Show that if A = | | then 
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w + z?(w — z)/[(w — z)? + z?] 
z — z?(w — z)/[(w — z)? + z?] 
—23 /[(w — z)* + z?]. 


RI N g 
tou y 


P8.3.5 Suppose A € C" ^ is Hermitian. Show how to construct unitary Q such that 
QH AQ = T is real, symmetric, and tridiagonal. 


P8.3.6 Show that if A = B + iC is Hermitian, then M = | P U^ | is symmetric. 
Relate the eigenvalues and eigenvectors of A and M. 


P8.3.7 Rewrite Algorithm 8.2.2 for the case when A is stored in two n-vectors. Justify 
the given flop count. 


P8.3.8 Suppose A = S 4 cuu? where S € R?*" is skew-symmetric (AT = -A, u € R” 
has unit 2-norm, and o € R. Show how to compute an orthogonal Q such that QT AQ 
is tridiagonal and QT u = In(:,1) = €. 


Notes and References for Sec. 8.3 


The tridiagonalization of a symmetric matrix is discussed in 


R.S. Martin and J.H. Wilkinson (1968). “Householder’s Tridiagonalization of a Sym- 
metric Matrix,” Numer. Math. 11, 181-95. See also Wilkinson and Reinsch (1971, 
pp.212-26). 

H.R. Schwartz (1968). "Tridiagonalization of a Symmetric Band Matrix,” Numer. Math. 
12, 231-41. See also Wilkinson and Reinsch (1971, pp.273-83). 

N.E. Gibbs and W.G. Poole, Jr. (1974). “Tridiagonalization by Permutations,” Comm. 
ACM 17, 20-24. 


The first two references contain Algol programs. Algol procedures for the explicit and 
implicit tridiagonal QR algorithm are given in 


H. Bowdler, R.S. Martin, C. Reinsch, and J.H. Wilkinson (1968). "The QR and QL 
Algorithms for Symmetric Matrices," Numer. Math. 11, 293-306. See also Wilkinson 
and Reinsch (1971, pp.227-40). 

A. Dubrulle, R.S. Martin, and J.H. Wilkinson (1968). "The Implicit QL Algorithm," 
Numer. Math. 12, 377-83. see also Wilkinson and Reinsch (1971, pp.241—48). 


The “QL” algorithm is identical to the QR algorithm except that at each step the matrix 
T — AI is factored into a product of an orthogonal matrix and a lower triangular matrix. 
Other papers concerned with these methods include 


G.W. Stewart (1970). “Incorporating Original Shifts into the QR Algorithm for Sym- 
metric Tridiagonal Matrices,” Comm. ACM 13, 365-67. 

A. Dubrulle (1970). “A Short Note on the Implicit QL Algorithm for Symmetric Tridi- 
agonal Matrices,” Numer. Math. 15, 450. 


Extensions to Hermitian and skew-symmetric matrices are described in 


D. Mueller (1966). “Householder’s Method for Complex Matrices and Hermitian Matri- 
ces,” Numer. Math. 8, 72-92. 

R.C. Ward and L.J. Gray (1978). “Eigensystem Computation for Skew-Symmetric and 
A Class of Symmetric Matrices,” ACM Trans. Math. Soft. 4, 278-85. 
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The convergence properties of Algorithm 8.2.3 are detailed in Lawson and Hanson (1974, 
Appendix B), as well as in 


J.H. Wilkinson (1968b). “Global Convergence of Tridiagonal QR Algorithm With Origin 
Shifts,” Lin. Alg. and Its Applic. I, 409-20. 

T.J. Dekker and J.F. Traub (1971). “The Shifted QR Algorithm for Hermitian Matrices,” 
Lin. Alg. and Its Applic. 4, 137-54. 

W. Hoffman and B.N. Parlett (1978). “A New Proof of Global Convergence for the 
Tridiagonal QL Algorithm,” SIAM J. Num. Anal. 15, 929-37. 

S. Batterson (1994). “Convergence of the Francis Shifted QR Algorithm on Normal 
Matrices,” Lin. Alg. and Its Applic. 207, 181-195. 


For an analysis of the method when it is applied to normal matrices see 


C.P. Huang (1981). “On the Convergence of the QR Algorithm with Origin Shifts for 
Normal Matrices,” IMA J. Num. Anal. 1, 127-33. 


Interesting papers concerned with shifting in the tridiagonal QR algorithm include 


F.L. Bauer and C. Reinsch (1968). “Rational QR Transformations with Newton Shift 
for Symmetric Tridiagonal Matrices,” Numer. Math. 11, 264-72. See also Wilkinson 
and Reinsch (1971, pp.257-65). 

G.W. Stewart (1970). “Incorporating Origin Shifts into the QR Algorithm for Symmetric 
Tridiagonal Matrices,” Comm. Assoc. Comp. Mach. 13, 365-67. 


Some parallel computation possibilities for the algorithms in this section are discussed in 


S. Lo, B. Philippe, and A. Sameh (1987). “A Multiprocessor Algorithm for the Symmet- 
ric Tridiagonal Eigenvalue Problem,” SIAM J. Sci. and Stat. Comp. 8, 8155-s165. 

H.Y. Chang and M. Salama (1988). “A Parallel Householder Tridiagonalization Strategy 
Using Scattered Square Decomposition,” Parallel Computing 6, 297-312. 


Another way to compute a specified subset of eigenvalues is via the rational QR algo- 
rithm. In this method, the shift is determined using Newton’s method. This makes it 
possible to “steer” the iteration towards desired eigenvalues. See 


C. Reinsch and F.L. Bauer (1968). “Rational QR Transformation with Newton’s Shift 
for Symmetric Tridiagonal Matrices,” Numer. Math. 11, 264-72. See also Wilkinson 
and Reinsch (1971, pp.257—65). 


Papers concerned with the symmetric QR algorithm for banded matrices include 


R.S. Martin and J.H. Wilkinson (1967). "Solution of Symmetric and Unsymmetric Band 
Equations and the Calculation of Eigenvectors of Band Matrices," Numer. Math. 9, 
279-301. See also See also Wilkinson and Reinsch (1971, pp.70-92). 

R.S. Martin, C. Reinsch, and J.H. Wilkinson (1970). "The QR Algorithm for Band 
Symmetric Matrices," Numer. Math. 16, 85-92. See also Wilkinson and Reinsch 
(1971, pp.266-72). 
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8.4 Jacobi Methods 


Jacobi methods for the symmetric eigenvalue problem attract current at- 
tention because they are inherently parallel. They work by performing a 
sequence of orthogonal similarity updates A — QT AQ with the property 
that each new A, although full, is “more diagonal” than its predecessor. 
Eventually, the off-diagonal entries are small enough to be declared zero. 

After surveying the basic ideas behind the Jacobi approach we develop 
a parallel Jacobi procedure. 


8.4.1 The Jacobi Idea 


The idea behind Jacobi’s method is to systematically reduce the quantity 


ie., the^norm" of the off-diagonal elements. The tools for doing this are 
rotations of the form 


1 0 0 0 
0 c 8 0 p 
J(p,q,0) = : 
0 —8 C 0 q 
0 = 0 e 0 1 
Dp q 


which we call Jacobi rotations. Jacobi rotations are no different from Givens 
rotations, c.f. §5.1.8. We submit to the name change in this section to honor 
the inventor. 

The basic step in a Jacobi eigenvalue procedure involves (1) choosing an 
index pair (p,q) that satisfies 1 < p < q < n, (2) computing a cosine-sine 
pair (c, s) such that 


T 
bop Ong _ c s App pq c Ss (8.4.1) 
bap baq —s c Agp aqq —s c UT 
is diagonal, and (3) overwriting A with B = JT AJ where J = J(p,q,0). 
Observe that the matrix B agrees with A except in rows and columns p 
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and q. Moreover, since the Frobenius norm is preserved by orthogonal 
transformations we find that 


2 2 2 _ 32 2 2 _ 72 2 
app + gq 205, = bpp + bgg + bpa = bpp + bag 


and so 


of(B? = IBI- 02 (8.4.2) 
i21 


n 
2 2 2 2 2 2 
= |All- Xa + (ap, + qa — bpp — baa) 
i=1 


off(A)? — 2a?, . 


It is in this sense that A moves closer to diagonal form with each Jacobi 
step. 

Before we discuss how the index pair (p, q) can be chosen, let us look at 
the actual computations associated with the (p,q) subproblem. 
8.4.2 The 2-by-2 Symmetric Schur Decomposition 
To say that we diagonalize in (8.4.1) is to say that 

0 = b,, = Apq(c? — s?) + (App — aqq)cs. (8.4.3) 

If apq = 0, then we just set (c, s) = (1,0) . Otherwise define 


aqq — a 
r = —— P and t = s/c 
2üpq 


and conclude from (8.4.3) that t — tan(0) solves the quadratic 
Ü +27t—-1=0. 
It turns out to be important to select the smaller of the two roots, 
t =-r + V1+?7? 
whereupon c and s can be resolved from the formulae 
c=1/ Ar s-—tc. 


Choosing £ to be the smaller of the two roots ensures that |0| € 7/4 and 
has the effect of minimizing the difference between B and A because 


n 
(B-A = 4(1—c) J (az, +07,) + 2a2,/c? 
- 
itp, 
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We summarize the 2-by-2 computations as follows: 


Algorithm 8.4.1 Given an n-by-n symmetric A and integers p and q that 
satisfy 1 € p < q < n, this algorithm computes a cosine-sine pair (c, s) 
such that if B = J(p,q,0)7 AJ(p,q, 0) then bj, = bgp = 0. 


function: [c, s] = sym.schur2(A, p,q) 
if A(p,q) #0 
T = (A(q, 9) — A(p, p))/(2A(p, q)) 


if7>0 
t=1/(r+ V1 +7"); 
else 
t = —1/(-r + V1+7?); 
end 
e=1/V1+# 
s=tc 
else 
c=1 
s=0 


end 


8.4.3 The Classical Jacobi Algorithm 


As we mentioned above, only rows and columns p and q are altered when 
the (p,q) subproblem is solved. Once sym.schur2 determines the 2-by-2 
rotation, then the update A — J(p,q,0)7 AJ(p,q,0) can be implemented 
in 6n flops if symmetry is exploited. 

How do we choose the indices p and q? From the standpoint of maxi- 
mizing the reduction of off(A) in (8.4.2), it makes sense to choose (p,q) so 
that a2, is maximal. This is the basis of the classical Jacobi algorithm. 


Algorithm 8.4.2 (Classical Jacobi) Given a symmetric A € IR"*^ and 
a tolerance tol > 0, this algorithm overwrites A with VT AV where V is 
orthogonal and off(V7 AV) < toll| A || p- 


V = In; eps = tol|| A||p 
while off(A) > eps 
Choose (p, q) so |a5;| = maxi; [a;;|. 
(c, s) 2 sym.schur2(A, p, q) 
A= J(p,q,0)7 AJ(p, 4, 9) 
V= VJ(p, q, 6) 
end 


Since [apg] is the largest off-diagonal entry, off(A)? < N(a2, + a2) where 
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N = n(n — 1)/2. From (8.4.2) it follows that 
off(B)? < (1 — x) off(A)? . 
N 
By induction, if AU? denotes the matrix A after k Jacobi updates, then 
1\* 
off( A)? < (1 — x) off(A)?. 


This implies that the classical Jacobi procedure converges at a linear rate. 

However, the asymptotic convergence rate of the method is considerably 
better than linear. Schonhage (1964) and van Kempen (1966) show that 
for k large enough, there is a constant c such that 


off(A** M) < c.of( AQ)? 


i.e., quadratic convergence. An earlier paper by Henrici (1958) established 
the same result for the special case when A has distinct eigenvalues. In 
the convergence theory for the Jacobi iteration, it is critical that |0| < 7/4. 
Among other things this precludes the possibility of “interchanging” nearly 
converged diagonal entries. This follows from the formulae bpp = ap, — tay, 
and b4, = aq, + tay4, which can be derived from equations (8.4.1) and the 
definition t = sin(0)/ cos(0). 

It is customary to refer to N Jacobi updates as a sweep. Thus, after 
a sufficient number of iterations, quadratic convergence is observed when 
examining off(A) after every sweep. 


Example 8.4.1 Applying the classical Jacobi iteration to 


11 1 1 
12 83 4 
4^7 |1i153 6 10 
1 4 10 20 


we find 


There is no rigorous theory that enables one to predict the number of 
Sweeps that are required to achieve a specified reduction in off(A). However, 
Brent and Luk (1985) have argued heuristically that the number of sweeps 
is proportional to log(n) and this seems to be the case in practice. 
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8.4.4 The Cyclic-by-Row Algorithm 


The trouble with the classical Jacobi method is that the updates involve 
O(n) flops while the search for the optimal (p,q) is O(n”). One way to 
address this imbalance is to fix the sequence of subproblems to be solved 
in advance. A reasonable possibility is to step through all the subproblems 
in row-by-row fashion. For example, if n = 4 we cycle as follows: 


(p,q) = (1,2), (1,3), (1, 4), (2, 3), (2, 4), (3, 4), (1, 2),.-- 


This ordering scheme is referred to as cyclic-by-row and it results in the 
following procedure: 


Algorithm 8.4.3 (Cyclic Jacobi) Given a symmetric A € R”*” and 
a tolerance tol > 0, this algorithm overwrites A with VT AV where V is 
orthogonal and off(V7 AV) < tol|| A ||p . 


V = In 
eps = tol|| A ||, 
while off(A) > eps 
for p= 1l:in-1 
for gq=p+1:n 
(c, s) = sym.schur2(A, p, q) 
A= J(p,q,0)" AJ(p,q,0) 
V =VJ(p,q, 6) 
end 
end 
end 


Cyclic Jacobi converges also quadratically. (See Wilkinson (1962) and van 
Kempen (1966).) However, since it does not require off-diagonal search, it 
is considerably faster than Jacobi's original algorithm. 


Example 8.4.2 If the cyclic Jacobi method is applied to the matrix in Example 8.4.1 
we find 


O(off(A)) 
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8.4.5 Error Analysis 


Using Wilkinson's error analysis it is possible to show that if r sweeps are 
needed in Algorithm 8.4.3 then the computed d; satisfy 


Didi - XY < (6+ Ae) All n 


for some ordering of A's eigenvalues A;. The parameter k, depends mildly 
onr. 

Although the cyclic Jacobi method converges quadratically, it is not 
generally competitive with the symmetric QR algorithm. For example, if 
we just count flops, then 2 sweeps of Jacobi is roughly equivalent to a com- 
plete QR reduction to diagonal form with accumulation of transformations. 
However, for small n this liability is not very dramatic. Moreover, if an ap- 
proximate eigenvector matrix V is known, then V7 AV is almost diagonal, 
a situation that Jacobi can exploit but not QR. 

Another interesting feature of the Jacobi method is that it can a com- 
pute the eigenvalues with small relative error if A is positive definite. To 
appreciate this point, note that the Wilkinson analysis cited above cou- 
pled the §8.1 perturbation theory ensures that the computed eigenvalues 
à eT Àn satisfy 

là: — ACA)| c: Ul A lle < uk2(A). 
Ai(A) A(A) 
However, a refined, componentwise error analysis by Demmel and Veselit 
(1992) shows that in the positive definite case, 


ETT = ukg(D-  AD7!). (8.4.4) 


where D = diag(,/aii,-.-,/@nn) and this is generally a much smaller ap- 
proximating bound. The key to establishing this result is some new pertur- 
bation theory and a demonstration that if A, is a computed Jacobi update 
obtained from the current matrix A,, then the eigenvalues of A, are rel- 
atively close to the eigenvalues of A, in the sense of (8.4.4). To make the 
whole thing work in practice, the termination criteria is not based upon 
the comparison of off(A) with ull A ||; but rather on the size of each |a;;| 
compared to u,/a;;25;. This work is typical of a new genre of research con- 
cerned with high-accuracy algorithms based upon careful, componentwise 
error analysis. See Mathias (1995). 


8.4.6 Parallel Jacobi 


Perhaps the most interesting distinction between the QR and Jacobi ap- 
proaches to the symmetric eigenvalue problem is the rich inherent paral- 
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lelism of the latter algorithm. To illustrate this, suppose n = 4 and group 
the six subproblems into three rotation sets as follows: 


rot.set(1) = {(1,2), (3, 4)) 
rot.set(2) = {(1,3), (2, 4)} 
rot.set(3) = {(1,4), (2,3)} 


Note that all the rotations within each of the three rotation sets are “non- 
conflicting." That is, subproblems (1,2) and (3,4) can be carried out in 
parallel. Likewise the (1,3) and (2,4) subproblems can be executed in par- 
allel as can subproblems (1,4) and (2,3). In general, we say that 


(41,91), (12, 92),---,(in,jn) ^ N = (n- 1)n/2 


is a parallel ordering of the set {(i,j)|1<i<j € n) if for s = ::n— 1 
the rotation set rot.set(s) = ( (ir, jr) :r = 1+n(s — 1)/2:ns/2 ) consists 
of nonconflicting rotations. This requires n to be even, which we assume 
throughout this section. (The odd n case can be handled by bordering 
A with a row and column of zeros and being careful when solving the 
subproblems that involve these augmented zeros.) 

A good way to generate a parallel ordering is to visualize a chess tourna- 
ment with n players in which everybody must play everybody else exactly 
once. In the n = 8 case this entails 7 “rounds.” During round one we have 
the following four games: 


113/|5]j7 
BEES rot.set(1) = { (1,2), (3,4), (5,6), (7,8) } 
i.e., 1 plays 2, 3 plays 4, etc. To set up rounds 2 through 7, player 1 stays 
put and players 2 through 8 embark on a merry-go-round: 


rot.set(2) = ((1,4), (2,6), (3,8), (5,7)) 
rot.set(3) = {(1,6), (4,8), (2,7), (3,5)} 
rot.set(4) = {(1,8), (6,7), (4, 5), (2,3)) 
rot.set(5) = {(1,7), (5,8), (3,6), (2,4)) 
rot.set(6) = {(1,5), (3,7), (2,8), (4,6)) 
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1|[5]|7 
Hep rot.set(7) = {(1,3), (2,5), (4,7), (6,8)} 


We can encode these operations in a pair of integer vectors top(1:n/2) and 
bot(1:n/2). During a given round top(k) plays bot(k) , k = 1:n/2. The 
pairings for the next round is obtained by updating top and bot as follows: 


function: [new.top, new.bot] = music(top, bot, n) 


m=n/2 
for k = 1:m 
ifk=1 
new.top(1) — 1 
else if k = 2 
new.top(k) — bot(1) 
elseif k > 2 
new.top(k) — top(k — 1) 
end 
ifk=m 
new.bot(k) = top(k) 
else 
new.bot(k) = bot(k + 1) 
end 
end 


Using music we obtain the following parallel order Jacobi procedure. 


Algorithm 8.4.4 (Parallel Order Jacobi) Given a symmetric A € IR?*^ 
and a tolerance tol > 0, this algorithm overwrites A with VT AV where V 
is orthogonal and off(V7 AV) < tol|| A || p . It is assumed that n is even. 


V2l, 
eps = tol|| A ||, 
top = 1:2:n; bot = 2:2:n 
while off(A) > eps 
for set = l:n — 1 
for k = Ln/2 
p = min(top(k), bot(E)) 
q = max(top(k), bot(k)) 
(c, s) = sym.schur2(A, p,q) 
A = J(p,q,9)" AJ(p, 4,9) 
V =VJ(p,9,8) 
end 
[top, bot] = music(top, bot, n) 
end 
end 
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Notice that the k-loop steps through n/2 independent, nonconflicting sub- 
problems. 


8.4.7 A Ring Procedure 


We now discuss how Algorithm 8.4.4 could be implemented on a ring of p 
processors. We assume that p = n/2 for clarity. At any instant, Proc() 
houses two columns of A and the corresponding V columns. For example, 
if n = 8 then here is how the column distribution of A proceeds from step 
to step: 


Proc(1) Proc(2) Proc(3) Proc(4) 
Step 1: [12] [34] [56] [78] 


Step 2: [14] [26] [38] [57] 
Step 3: [16] [48] [27] [35] 
etc. 


The ordered pairs denote the indices of the housed columns. The first index 
names the left column and the second index names the right column. Thus, 
the left and right columns in Proc(3) during step 3 are 2 and 7 respectively. 

Note that in between steps, the columns are shuffled according to the 
permutation implicit in music and that nearest neighbor communication 
prevails. At each step, each processor oversees a single subproblem. This 
involves (a) computing an orthogonal Vamau € TR2*? that solves a local 2- 
by-2 Schur problem, (b) using the 2-by-2 V,maii to update the two housed 
columns of A and V, (c) sending the 2-by-2 V maiu to all the other proces- 
sors, and (d) receiving the Vsmai matrices from the other processors and 
updating the local portions of A and V accordingly. Since A is stored by 
column, communication is necessary to carry out the Vma updates be- 
cause they effect rows of A. For example, in the second step of the n — 8 
problem, Proc(2) must receive the 2-by-2 rotations associated with sub- 
problems (1,4), (3,8), and (5,7). These come from Proc(1), Proc(3), and 
Proc(4) respectively. In general, the sharing of the rotation matrices can 
be conveniently implemented by circulating the 2-by-2 Vsmai matrices in 
“merry go round" fashion around the ring. Each processor copies a pass- 
ing 2-by-2 Vsmati into its local memory and then appropriately updates the 
locally housed portions of A and V. 

The termination criteria in Algorithm 8.4.4 poses something of a prob- 
lem in a distributed memory environment in that the value of off(-) and 
|| A ||,» require access to all of A. However, these global quantities can be 
computed during the V matrix merry-go-round phase. Before the circu- 
lation of the V’s begins, each processor can compute its contribution to 
|| A ||; and off(-) . These quantities can then be summed by each processor 
if they are placed on the merry-go-round and read at each stop. By the 
end of one revolution each processor has its own copy of || A ||p and off(-). 
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8.4.8 Block Jacobi Procedures 


It is usually the case when solving the symmetric eigenvalue problem on a 
p-processor machine that n > p. In this case a block version of the Jacobi 
algorithm may be appropriate. Block versions of the above procedures are 
straightforward. Suppose that n = rN and that we partition the n-by-n 
matrix A as follows: 


Ay ore Ain 
A= : : 
Anı © Ann 


Here, each Ai; is r-by-r. In block Jacobi the (p,q) subproblem involves 
computing the 2r-by-2r Schur decomposition 


T 
|% w | | a Tw v - E O | 
Vap Va ap Aq Vap Vaa 


and then applying to A the block Jacobi rotation made up of the V;; . If 
we call this block rotation V then it is easy to show that 


of(V7 AV)? = off(A)? — (2l Apa lẹ + Off App)? + off(Agg)?) - 


Block Jacobi procedures have many interesting computational aspects. For 
example, there are many ways to solve the subproblems and the choice 
appears to be critical. See Bischof (1987). 


Problems 


P8.4.1 Let the scalar y be given along with the matrix 
A= | wr . 
r z 


It is desired to compute an orthogonal matrix 


ipi 
-s € 
such that the (1, 1) entry of JT AJ equals y. Show that this requirement leads to the 
equation 
(w —3)7? - 2zr + (z - 5) 2 0, 
where 7 = c/s. Verify that this quadratic has real roots if y satisfies A2 < y € A1, where 
Ai and A2 are the eigenvalues of A. 


P8.4.2 Let A € R” *” be symmetric. Give an algorithm that computes the factorization 
QTAQ = yI +F 

where Q is a product of Jacobi rotations, y = trace(A)/n, and F has zero diagonal 

entries. Discuss the uniqueness of Q. 

P8.4.3 Formulate Jacobi procedures for (a) skew symmetric matrices and (b) complex 
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Hermitian matrices. 


P8.4.4 Partition the n-by-n real symmetric matrix A as follows: 


Let Q be a Householder matrix such that if B = QT AQ, then B(3:n,1) = 0. Let 
J = J(1,2,0) be determined such that if C = JT BJ, then c12 = 0 and c11 > c22. Show 
C11 > a4 || v ll. La Budde (1964) formulated an algorithm for the symmetric eigenvalue 
probem based upon repetition of this Householder-Jacobi computation. 


P8.4.5 Organize function music so that it involves minimum workspace. 


P8.4.6 When implementing cyclic Jacobi, it is sensible to skip the annihilation of apg 
if its modulus is less than some small, sweep-dependent parameter, because the net re- 
duction in off( A) is not worth the cost. This leads to what is called the threshold Jacobi 
method. Details concerning this variant of Jacobi's algorithm may be found in Wilkinson 
(1965, p.277). Show that appropriate thresholding can guarantee convergence. 


Notes and References for Sec. 8.4 


Jacobi's original paper is one of the earliest references found in the numerical analysis 
literature 


C.G.J. Jacobi (1846). "Uber ein Leichtes Verfahren Die in der Theorie der Sacularstroun- 
gen Vorkommendern Gleichungen Numerisch Aufzulosen," Crelle’s J. 30, 51-94. 
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8.5 Tridiagonal Methods 


In this section we develop special methods for the symmetric tridiagonal 
eigenproblem. The tridiagonal form 


a1 bi t 0 
bi a2 : 
T= ZEE (8.5.1) 
= by-1 
0 bn-1 Qn 


can be obtained by Householder reduction (cf. §8.3.1). However, symmetric 
tridiagonal eigenproblems arise naturally in many settings. 

We first discuss bisection methods that are of interest when selected 
portions of the eigensystem are required. This is followed by the presen- 
tation of a divide and conquer algorithm that can be used to acquire the 
full symmetric Schur decomposition in a way that is amenable to parallel 
processing. 


8.5.1 Eigenvalues by Bisection 


Let T, denote the leading r-by-r principal submatrix of the matrix T in 
(8.5.1). Define the polynomials p,(z) = det(T, — xI), r = 1:n. A simple 
determinantal expansion shows that 


Pr(z) = (ar — z)pe-i(2) — br- 1Pr-2(2) (8.5.2) 


for r = 2:n if we set po(x) = 1. Because p,(r) can be evaluated in O(n) 


flops, it is feasible to find its roots using the method of bisection. For 
example, if pa(y)Pn(z) < 0 and y < z, then the iteration 


while |y — z| > e(|y| + |zl) 
z = (y + z)/2 
if pa(£)pn(y) < 0 
z=2 
else 


end 
is guaranteed to terminate with (y + z)/2 an approximate zero of p,(z), 


ie., an approximate eigenvalue of T. The iteration converges linearly in 
that the error is approximately halved at each step. 
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8.5.2 Sturm Sequence Methods 


Sometimes it is necessary to compute the kth largest eigenvalue of T for 
some prescribed value of k. This can be done efficiently by using the bisec- 
tion idea and the following classical result: 


Theorem 8.5.1 (Sturm Sequence Property) If the tridiagonal matrix 
in (8.5.1) has no zero subdiagonal entries, then the eigenvalues of T,—1 
strictly separate the eigenvalues of T,: 


Ar (Ty) < àr-1(T--1) < Ar—-1(Ty) Sree A2(T;) < A1(T-1) < Ai (T;). 
Moreover, if a(\) denotes the number of sign changes in the sequence 


{ po(A); m2), ress Palà) } 


then a(A) equals the number of T's eigenvalues that are less than A. Here, 
the polynomials p,(x) are defined by (8.5.2) and we have the convention 
that p,(A) has the opposite sign of p. 1(4) if p. (4) = 0. 


Proof. It follows from Theorem 8.1.7 that the eigenvalues of T... weakly 
separate those of T,. To prove that the separation must be strict, suppose 
that p.(u) = p.-i(u) = 0 for some r and yp. It then follows from (8.5.2) 
and the assumption that T is unreduced that po(u) = m(p) = +- = p. (u) 
— 0, a contradiction. Thus, we must have strict separation. 

The assertion about a(A) is established in Wilkinson (1965, 300-301). 
We mention that if p,(A) = 0, then its sign is assumed to be opposite the 
sign of p,.1(4). O 


Example 8.5.1 If 
1 -1 0 0 
-1 2 -1 0 
0 -1 3 -1 
0 0 -1 


then A(T) =% {.254, 1.82, 3.18, 4.74). The sequence 
{ po(2), pi(2), p2(2), p3(2), p4(2) } = { 1, -1, —1, 0, 1} 


confirms that there are two eigenvalues less than A = 2. 


T= 


Suppose we wish to compute A(T). From the Gershgorin theorem 
(Theorem 8.1.3) it follows that A(T) € [y, z] where 


y= min aj-|b|—|bi-i z= max a; + |bi| + |bi-1| 
l<ign l<i<n 


if we define bg = b, = 0. With these starting values, it is clear from the 
Sturm sequence property that the iteration 
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while |z — y| > u([y| + |z|) 
z = (y + z)/2 
if a(z) > n-—k (8.5.3) 
Z=2 
else 


end 


produces a sequence of subintervals that are repeatedly halved in length 
but which always contain A&(T). 


Example 8.5.2 If (8.5.3) is applied to the matrix of Example 8.5.1 with k = 3, then 
the values shown in the following table are generated: 


y z z a(z) 
0.0000 5.0000 2.5000 
0.0000 2.5000 1.2500 
1.2500 2.5000 1.3750 
1.3750 2.5000 1.9375 
1.3750 1.9375 1.6563 
1.6563 1.9375 1.7969 


Hn HB NH HIN 


We conclude from the output that A3(T) € [1.7969, 1.9375]. Note: A3(T) ~ 1.82. 


During the execution of (8.5.3), information about the location of other 
eigenvalues is obtained. By systematically keeping track of this informa- 
tion it is possible to devise an efficient scheme for computing "contiguous" 
subsets of A(T), e.g., Ax(T), Aka1(T),..., Ak (T). See Barth, Martin, and 
Wilkinson (1967). 

If selected eigenvalues of a general symmetric matrix A are desired, 
then it is necessary first to compute the tridiagonalization T = Ud TUo 
before the above bisection schemes can be applied. This can be done using 
Algorithm 8.3.1 or by the Lanczos algorithm discussed in the next chapter. 
In either case, the corresponding eigenvectors can be readily found via 
inverse iteration since tridiagonal systems can be solved in O(n) flops. See 
54.3.6 and 88.2.2. 

In those applications where the original matrix A already has tridiagonal 
form, bisection computes eigenvalues with small relative error, regardless of 
their magnitude. This is in contrast to the tridiagonal QR iteration, where 
the computed eigenvalues À; can be guaranteed only to have small absolute 
error: |A; — A(T) = ull T || 

Finally, it is possible to compute specific eigenvalues of a symmetric ma- 
trix by using the LDLT factorization (see $4.2) and exploiting the Sylvester 
inertia theorem (Theorem 8.1.17). If 


A-ul = LDL?  A-AT em 
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is the LDL? factorization of A — pI with D = diag(d),...,d,), then the 
number of negative d; equals the number of A;( A) that are less than u. See 
Parlett (1980, p.46) for details. 


8.5.3 Eigensystems of Diagonal Plus Rank-1 Matrices 


Our next method for the symmetric tridiagonal eigenproblem requires that 
we be able to compute efficiently the eigenvalues and eigenvectors of a 
matrix of the form D + pzz? where D c IR?*"^is diagonal, z € IR?, and 
p € R. This problem is important in its own right and the key computations 
rest upon the following pair of results. 


Lemma 8.5.2 Suppose D = diag(di,...,d4) € IR"*” has the property that 
dı >- > dn . Assume that p # 0 and that z € IR^ has no zero compo- 
nents. If 


(D+ pzz7)v = Av v#0 


then zTv x 0 and D — M is nonsingular. 
Proof. If à € A(D) , then A = dj for some i and thus 


0 — eZ ((D — Myw + p(zTv)z] = p(zTv)z;. 


Since p and z; are nonzero we must have 0 = z7v and so Dv = Av. How- 


ever, D has distinct eigenvalues and therefore, v € span{e;}. But then 
0 = zTv = x, a contradiction. Thus, D and D + pzz do not have any 
common eigenvalues and z7v # 0. O 


Theorem 8.5.3 Suppose D = diag(di,...,d4) € IR?*" and that the diag- 
onal entries satisfy dy >--- > dn. Assume that p £ 0 and that z € IR" has 
no zero components. If V € IR^*" is orthogonal such that 


VT(D + pzz7)V = diag(Ai,..., An) 
with Ay > +++ 2 A, and V =[,...,Un], then 
(a) The Aj are the n zeros of fA) = 1 + pz? (D — AI)-1z. 


(b) If p> 0, then ài > dy > An > +++ > dy. 
If p «0, then dy > Ay > dg 5 > dy > An. 


(c) The eigenvector v; is a multiple of (D — XI)^!z. 
Proof. If (D + pzzT)v = Av, then 


(D — Ayw + p(zTv)z = 0. (8.5.4) 
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We know from Lemma 8.5.2 that D — AI is nonsingular. Thus, 
v € span((D — \I)~!z} 


thereby establishing (c). Moreover, if we apply zT (D — AI)! to both sides 
of equation (8.5.4) we obtain 


zv (1 pzT (D — AI)7!z) = 0. 


By Lemma 8.5.2, zT v Æ 0 and so this shows that if A € \(D + pzz), then 
f(A) = 0. We must show that all the zeros of f are eigenvalues of D + pzz? 
and that the interlacing relations (b) hold. 

To do this we look more carefully at the equations 


1+ NE: 
PIT -A d, — 


E 2 
rO = e(a se uus) 


Note that f is monotone in between its poles. This allows us to conclude 
that if p > 0, then f has precisely n roots, one in each of the intervals 


(dn, ds .1), cry (da, di), (di, oo). 


If p « 0 then f has exactly n roots, one in each of the intervals 


(—00, dn), (dn, d, i1), trt (da, di). 


fA) 


In either case, it follows that the zeros of f are precisely the eigenvalues of 
D+ pov? .O 


The theorem suggests that to compute V we (a) find the roots \1,...,An 
of f using a Newton-like procedure and then (b) compute the columns of 
V by normalizing the vectors (D — A,I)^!z for i = 1:n. The same plan of 
attack can be followed even if there are repeated d; and zero z;. 


Theorem 8.5.4 If D = diag(d),...,d,) and z € R”, then there exists an 
orthogonal matriz Vy such that if V7 DV, = diag(ui,..., un) and w = 
VEz then 

Hi > p2 > > Hr 2 Hry 2-7 2 Un, 
wi £0 fori = Lr, and w; 2 0 fori=rt+iin. 


Proof. We give a constructive proof based upon two elementary opera- 
tions. (a) Suppose d; = d; for some i < j . Let J(i,j,0) be a Jacobi 
rotation in the (i,j) plane with the property that the jth component of 
J(i, j, 8)" z is zero. It is not hard to show that J(1, j,0)7 DJ(i, j,0) = D. 
Thus, we can zero a component of z if there is a repeated d;. (b) If z; = 0, 
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zj #0, and i < j, then let P be the identity with columns ? and j inter- 
changed. It follows that PT DP is diagonal, (P7z); Æ 0, and (P7); = 0. 
Thus, we can permute all the zero z; to the “bottom.” Clearly, repetition 
of (a) and (b) eventually renders the desired canonical structure. V is the 
product of the rotations. O 


See Barlow (1993) and the references therein for a discussion of the solution 
procedures that we have outlined above. 


8.5.4 A Divide and Conquer Method 


We now present a divide-and-conquer method for computing the Schur 
decomposition 


QTTQ = A = diag(\y,...,An) Q7Q =1 (8.5.5) 


for tridiagonal T that involves (a) “tearing” T in half, (b) computing the the 
Schur decompositions of the two parts, and (c) combining the two half-sized 
Schur decompositions into the required full size Schur decomposition. The 
overall procedure, developed by Dongarra and Sorensen (1987), is suitable 
for parallel computation. 

We first show how T can be “torn” in half with a rank-one modification. 
For simplicity, assume n = 2m. Define v € IR" as follows 


em 
v= | ptm) | - (8.5.6) 


Note that for all p € IR the matrix T=T- pvu is identical to T except 
in its “middle four" entries: 


T(m:m + 1,m:m-1) = Qm — p bm — pO 
bm — p Gm41 — pb? 


If we set p9 = bm then 


0 T 
where 
a, b ee 0 
by a2 : 
T = ; 
b -1 


QO e bm-1 Öm 


8.5. TRIDIAGONAL METHODS 445 


üámai mai e 0 


bm+1 Qm42 . 
To = n el tu , 


0 e. bn-1 Qn 


and õm = am — p and Gm41 = Qm41 — p8?. 
Now suppose that we have m-by-m orthogonal matrices Q, and Q2 such 
that Q77T,Q, = Dı and QT TQ; = D» are each diagonal. If we set 


where 


is diagonal and 


— rT, QTem 
za uty = | Ben |. 


Comparing these equations we see that the effective synthesis of the two 
half-sized Schur decompositions requires the quick and stable computation 
of an orthogonal V such that 


VT(D * pzz7)V = A = diag(\y,...,An) 


which we discussed in §8.5.3. 


8.5.5 A Parallel Implementation 


Having stepped through the tearing and synthesis operations, we are ready 
to illustrate the overall process and how it can be implemented on a mul- 
tiprocessor. For clarity, assume that n = 8N for some positive integer N 
and that three levels of tearing are performed. We can depict this with a 
binary tree as shown in Fic. 8.5.1. The indices are specified in binary. 
Fic. 8.5.2 depicts a single node and should be interpreted to mean that 
the eigensystem for the tridiagonal T(b) is obtained from the eigensystems 
of the tridiagonals T(b0) and T(b1). For example, the eigensystems for the 
N-by-N matrices T(110) and T(111) are combined to produce the eigen- 
system for the 2N-by-2N tridiagonal matrix T' (11). 
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T 
T(0) T(1) 
T(00) T(01) T(10) T(11) 


T(000) (001) T(010) T(011) T(100 T(101) T(110) (111) 


FIGURE 8.5.1 Computation Tree 


T(b) 


SN 


T(b0) Tè) 


FIGURE 8.5.2 Synthesis at a Node 


With tree-structured algorithms there is always the danger that paral- 
lelism is lost as the tree is “climbed” towards the root, but this is not the 
case in our problem. To see this suppose we have 8 processors and that the 
first task of Proc(b) is to compute the Schur decomposition of T(b) where 
b = 000, 001, 010, 011, 100, 101, 110, 111. This portion of the computation is 
perfectly load balanced and does not involve interprocessor communication. 
(We are ignoring the Theorem 8.5.4 deflations, which are unlikely to cause 
significant load imbalance.) 


At the next level there are four gluing operations to perform: T'(00), 
T(01), T(10), T(11). However, each of these computations neatly subdi- 
vides and we can assign two processors to each task. For example, once 
the secular equation that underlies the T(00) synthesis is known to both 
Proc(000) and Proc(001), then they each can go about getting half of the 
eigenvalues and corresponding eigenvectors. Likewise, 4 processors can each 
be assigned to the T(0) and T(1) problem. All 8 processors can participate 
in computing the eigensystem of T. Thus, at every level full parallelism 
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can be maintained because the eigenvalue/eigenvector computations are 
independent of one another. 


Problems 


P8.5.1 Suppose À is an eigenvalue of a symmetric tridiagonal matrix T. Show that if 
A has algebraic multiplicity k, then at least k — 1 of T's subdiagonal elements are zero. 
P8.5.2 Give an algorithm for determining p and @ in (8.5.6) with the property that 
0 € (—1,1) and min( [ar — pl, a1 — p| ) is maximized. 

P8.5.3 Let p,(A) = det(T(1:r, lir) — AI.) where T is given by (8.5.1). Derive a re- 
cursion for evaluating p/, (4) and use it to develop a Newton iteration that can compute 
eigenvalues of T. 

P8.5.4 What communication is necessary between the processors assigned to a partic- 
ular T5? Is it possible to share the work associated with the processing of repeated dj 
and zero z; ? 


P8.5.5 If T is positive definite, does it follow that the matrices Tj and T» in §8.5.4 are 


positive definite? 
D v 
A= | vT dan 


P8.5.6 Suppose that 

where D = diag(d),...,dn—1) has distinct diagonal entries and v € R"^^! has no zero 
entries. (a) Show that if A € A(A), then D — A141 is nonsingular. (b) Show that if 
Aà € A(A), then A is a zero of 


n-1 v2 
3)32A ko — dy. 
FO) +a n 


P8.5.7 Suppose A = S+ouuT where S c R? X" is skew-symmetric, u € R”, and o € R. 
Show how to compute an orthogonal Q such that QT AQ =T + ceiel where T' is tridi- 
agonal and skew-symmetric and e is the first column of In. 


P8.5.8 It is known that A € A(T) where T € R”*” is symmetric and tridiagonal with 
no zero subdiagonal entries. Show how to compute z(1:n — 1) from the equation Tz = Az 
given that 2, = 1. 


Notes and References for Sec. 8.5 


Bisection/ Strum sequence methods are discussed in 


W. Barth, R.S. Martin, and J.H. Wilkinson (1967). "Calculation of the Eigenvalues of 
a Symmetric Tridiagonal Matrix by the Method of Bisection,” Numer. Math. 9, 
386-93. See also Wilkinson and Reinsch (1971, 249—256). 

K.K. Gupta (1972). "Solution of Eigenvalue Problems by Sturm Sequence Method," Int. 
J. Numer. Meth. Eng. 4, 379-404. 


Various aspects of the divide and conquer algorithm discussed in this section is detailed in 


G.H. Golub (1973). “Some Modified Matrix Eigenvalue Problems,” SIAM Review 15, 
318-44. 

J.R. Bunch, C.P. Nielsen, and D.C. Sorensen (1978). "Rank-One Modification of the 
Symmetric Eigenproblem," Numer. Math. 31, 31-48. 

J.J.M. Cuppen (1981). *A Divide and Conquer Method for the Symmetric Eigenprob- 
lem," Numer. Math. 36, 177-95. 


448 CHAPTER 8. THE SYMMETRIC EIGENVALUE PROBLEM 


J.J. Dongarra and D.C. Sorensen (1987). “A Fully Parallel Algorithm for the Symmetric 
Eigenvalue Problem,” SIAM J. Sci. and Stat. Comp. 8, 8139-5154. 

S. Crivelli and E.R. Jessup (1995). “The Cost of Eigenvalue Computation on Distributed 
Memory MIMD Computers,” Parallel Computing 21, 401-422. 


The very delicate computations required by the method are carefully analyzed in 


J.L. Barlow (1993). “Error Analysis of Update Methods for the Symmetric Eigenvalue 
Problem,” SIAM J. Matrix Anal. Appl. 14, 598—618. 


Various generalizations to banded symmetric eigenproblems have been explored. 


P. Arbenz, W. Gander, and G.H. Golub (1988). “Restricted Rank Modification of the 
Symmetric Eigenvalue Problem: Theoretical Considerations,” Lin. Alg. and Its 
Applic. 104, 75-95. 

P. Arbenz and G.H. Golub (1988). “On the Spectral Decomposition of Hermitian Ma- 
trices Subject to Indefinite Low Rank Perturbations with Applications,” SIAM J. 
Matriz Anal. Appl. 9, 40-58. 


A related divide and conquer method based on the “arrowhead” matrix (see P8.5.7) is 
given in 


M. Gu and S.C. Eisenstat (1995). “A Divide-and-Conquer Algorithm for the Symmetric 
Tridiagonal Eigenproblem,” SIAM J. Matriz Anal. Appl. 16, 172-191. 


8.6 Computing the SVD 


There are important relationships between the singular value decomposition 
of a matrix A and the Schur decompositions of the symmetric matrices 


T 
ATA, AAT, and | 4 ^ ES if 
UT AV = diag(oj,...,on) 


is the SVD of A € R™*" (m > n), then 


VT(ATA)V = diag(o?,...,02) € I^" (8.6.1) 
and 
UT(AAT)U = diag(o?,...,02,0,...,0) e IR"*" (8.6.2) 
— 
m-n 


Moreover, if 
U =[U Uz | 


n m-n 


and we define the orthogonal matrix Q € RO™+™*(™+n) py 


alv v o 
2 U, -U V2U2 
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then 


rf 0 ^ | . 
= diag(o1,...,04, —01,..., 09, 0,...,0 ). (86.3 
QP] a ^ |Q = datos sso resin ). (&63) 


m-n 


These connections to the symmetric eigenproblem allow us to adapt the 
mathematical and algorithmic developments of the previous sections to the 
singular value problem. Good references for this section include Lawson 
and Hanson (1974) and Stewart and Sun (1990). 


8.6.1 Perturbation Theory and Properties 


We first establish perturbation results for the SVD based on the theorems 
of §8.1. Recall that o;(A) denotes the ith largest singular value of A. 


Theorem 8.6.1 If A € IR"**, then for k = 1:min{m, n} 


ex (A) = max min yl Ax : | Ar ll2 


————— = m . 
dim(S)=k es i = 
aimsj=t seg Ta lallvlz  amsjek ces [elk 


Note that in this expression S C IR^ and T C IR" are subspaces. 


Proof. The right-most characterization follows by applying Theorem 8.1.2 
to AT A. The remainder of the proof we leave as an exercise. O 


Corollary 8.6.2 If A and A+E are in IR? *" with m > n, then fork = 1:n 
lex(A + E) - ex(4)) € e1(E) = | E lla 
Proof. Apply Corollary 8.1.6 to 


0 AT 0 (A+B 
l4 o] and Pp 0 a 


Example 8.6.1 If 


1 4 1 4 
A= 2 5 and A+E = 2 5 
3 6 3 6.01 


then o(A) = {9.5080, .7729} and o(A + E) = {9.5145,.7706}. It is clear that for i = 1:2 
we have |c;(A + E) - e1(A)| € || E |l2 = .01. 


Corollary 8.6.3 Let A = [a1,...,a4] € IR"*" be a column partitioning 
with m > n. If Ar = [ai,...,a.], then forr =1:n-1 


01(Apy1) > O1(Ar) 2 02(A541) 2 2 orl Ary) 2 orl A,) 2 Or 41 (Ar gi): 
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Proof. Apply Corollary 8.1.7 to AT A. O 


This last result says that by adding a column to a matrix, the largest 
singular value increases and the smallest singular value is diminished. 


Example 8.3.2 


1 6 11 
2 7 12 c(A1) = {7.4162} 
A-2|3 8 13 > a(A2) = {19.5377, 1.8095} 
4 9 14 o(A3) = (35.1272, 2.4654, 0.0000} 
5 10 15 


thereby confirming Corollary 8.6.3. 


The next result is a Wielandt-Hoffman theorem for singular values: 


Theorem 8.6.4 If A and A+ E are in IR?*" with m > n, then 
(ok(A + E) —ox(A))” < Elle. 
k=1 


0 AT 0 (A+B 
Proof. Apply Theorem 8.1.4 to | A 0 | and | A+E 0 .aü 


Example 8.6.3 If 
1 4 1 4 
A= 2 5 and A+E= 2 5 
3 6 3 6.01 


Y (A E) -ex(A)* = am x 1074 
k=1 
See Example 8.6.1. 


then 


< 10-* = | Elz. 


For A € IR"*" we say that the k-dimensional subspaces S C JR” and 
T C R” form a singular subspace pair if r € S and y € T imply Ax € T 
and ATy € S. The following result is concerned with the perturbation of 
singular subspace pairs. 


Theorem 8.6.5 Let A, E € IR?*" with m > n be given and suppose that 
V eR” and U € R™*™ are orthogonal. Assume that 


V-[W WV] U-[U U ] 
T n—r r m-r 
and that ran(Vi) and ran(U,) form a singular subspace pair for A. Let 


A 0 T 
H _ 11 
U AV ~ | 0 Ago | Ti — T 


r n—r 
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UHEV = En En T 
m 


En Eo =r 
T n—r 
and assume that 
62 min le — y| > 0. 
c€c(An) 
y€c(Aa2) 
If 
ô 
E < -> 
lEle < $ 


then there exist matrices P € IR(^-7*" and Q € R'"-")*" satisfying 


I5] 


such that ran(V; + VQ) and ran(U, + U2P) is a singular subspace pair for 
A4 E. 


«4L Ele 
F 


Proof. See Stewart (1973), Theorem 6.4. O 


Roughly speaking, the theorem says that O(c) changes in A can alter a 
singular subspace by an amount c/6, where ô measures the separation of 
the relevant singular values. 


Example 8.6.4 The matrix A = diag(2.000, 1.001, .999) € RÍ*? has singular subspace 
pairs (span(vi), span{u;}) for i = 1, 2, 3 where v; = eO and u; — eO. Suppose 
2.000 .010 .010 

.010 1.001 .010 

.010 .010 .999 


.010  .010 .010 


A+E= 


The corresponding columns of the matrices 


.9999 —.0144 .0007 
.0101 7415 .6708 


U = [48245 = | "goi 16707 -7616 

.0051 — .0138  —.0007 
. .9999 —.0143 .0007 
V = [ô ô2 ôs] = | .0101 — .7416 — .6708 


.0101 ‘6707  —.7416 
define singular subspace pairs for A+. Note that the pair {span{;}, span{é;}}, is close 
to (span(vi), span(u;)) for i = 1 but not for i = 2 or 3. On the other hand, the singular 
subspace pair (span(2, 03), span(à2, à3)) is close to (span (vo, v3}, span(uz, ua) ). 
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8.6.2 The SVD Algorithm 


We now show how a variant of the QR algorithm can be used to com- 
pute the SVD of an A € IR?*" with m > n. At first glance, this appears 
straightforward. Equation (8.6.1) suggests that we 


e form C = ATA, 
e use the symmetric QR algorithm to compute VI CV; = diag(c2), 
e apply QR with column pivoting to AV, obtaining UT (AV JII = R. 


Since R has orthogonal columns, it follows that U7 A(ViII) is diagonal. 
However, as we saw in Example 5.3.2, the formation of ATA can lead to a 
loss of information. The situation is not quite so bad here, since the original 
A is used to compute U. 

A preferable method for computing the SVD is described in Golub and 
Kahan (1965). Their technique finds U and V simultaneously by implicitly 
applying the symmetric QR algorithm to AT A. The first step is to reduce 
A to upper bidiagonal form using Algorithm 5.4.2: 


di fi wee 0 
0 de : 
UZAV = |5 | B= re ER”, 
: "e fn-1 
QO --- 0 dn 


The remaining problem is thus to compute the SVD of B. To this end, con- 
sider applying an implicit-shift QR step (Algorithm 8.3.2) to the tridiagonal 
matrix T = BT B: 


e Compute the eigenvalue A of 


d, + fey dm fm 


dmfm d+ f2 


T(m:n, m:n) = m-n-1 


that is closer to d? + f2. 


e Compute cı = cos(01) and sı = sin(6,) such that 


[23] [82]- [5] 


and set Gi = G(1,2,6). 
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e Compute Givens rotations G5,..., G4; so that if Q = Gi ++- G4. 
then QT TQ is tridiagonal and Qe, = Gye). 


Note that these calculations require the explicit formation of BT B, which, 
as we have seen, is unwise from the numerical standpoint. 

Suppose instead that we apply the Givens rotation G, above to B di- 
rectly. Illustrating with the n — 6 case this gives 


x x 0 0 0 0 
+ x x 000 
0 0 x x 0 0 
B-BG = |o 9 0 x x 0 
0 0 0 0 x x 
0 0 0 0 0 x 
We then can determine Givens rotations U1, V2, U2,..., V, 1, and U,_, to 


chase the unwanted nonzero element down the bidiagonal: 


x x + 0 0 0 
0 x x 0 0 0 
0 0 x x 0 0 
B-UrB=15 9 9 x x 0 
0 0 0 0 x x 
0 0 0 0 0 x 
x x 0 0 0 0 
0 x x 0 0 0 
0 + x x 0 0 
B — BY, = 0. 0 0 x x O0 
0 0 0 0 x x 
0 0 0 0 0 x 
x x 0 0 0 0 
0 x x + 0 0 
0 0 x x 0 0 
B — U;B = 00 0 x x 0 
0 0 0 0 x x 
0.000 0 x 


and so on. The process terminates with a new bidiagonal B that is related 
to B as follows: 


B = (UJ ,--.UD)B(G1V9--- V4.1) = ÜTBYV. 
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Since each V; has the form V; = G(i,i + 1,6;) where i = 2:n — 1, it follows 
that Ve; = Qe). By the implicit Q theorem we can assert that V and Q 
are essentially the same. Thus, we can implicitly effect the transition from 
T to T = BT B by working directly on the bidiagonal matrix B. 

Of course, for these claims to hold it is necessary that the underlying 
tridiagonal matrices be unreduced. Since the subdiagonal entries of BT B 
are of the form d;.., fi, it is clear that we must search the bidiagonal band 
for zeros. If fi, = 0 for some k, then 

B, 0 k 
B- | 0 Bo | n—k 
k n-k 


and the original SVD problem decouples into two smaller problems involv- 
ing the matrices Band Bo. If dy = 0 for some k < n, then premultiplication 
by a sequence of Givens transformations can zero fy. For example, if n = 
6 and k = 3, then by rotating in row planes (3,4), (3,5), and (3,6) we can 
zero the entire third row: 


coocox 
coocooxXx x 
ooocoxo 
oOooxxoo 
oOoxxooo 
xXxoooo 

IZ 

E 
ecooocooxXx 
coooxXx x 
ooooxo 
oOooxooo 
oOx x-oo 
xXxoooo 


~ 
L 
a 

Z 
~ 
p 
o 
2 


| 
oooooc xXx 
oooo X X 
oooox © 
ooxXxooco 
ox xXxoooeo 
x xo-roo 

| 
oooooc xXx 
ooceo X X 
oooox © 
oOx oo eo 
ox xooo 
x Xooooco 


If d, = 0, then the last column can be zeroed with a series of column 
rotations in planes (n — 1,n), (n — 2,n),..., (1, n). Thus, we can decouple 
if fi fn-1 = O or dj --+dy =0. 


Algorithm 8.6.1 (Golub-Kahan SVD Step) Given a bidiagonal matrix 
B € IR?*" having no zeros on its diagonal or superdiagonal, the following 
algorithm overwrites B with the bidiagonal matrix B = UT BV where U 
and V are orthogonal and V is essentially the orthogonal matrix that would 
be obtained by applying Algorithm 8.3.2 to T = BT B. 
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Let u be the eigenvalue of the trailing 2-by-2 submatrix of T = BTB 
that is closer to tnn- 


y—-íu-HA 
z—i 
for k=1:n—-1 


Determine c = cos(0) and s = sin(6) such that 
c s 
ty alf e i= o] 
B = BG(k,k -- 1,0) 
y = bkk; Z = bk+1,k 
Determine c = cos(0) and s = sin(0) such that 


es] fy _ [> 
[oe] E] ed] 
B — G(k,k 4 1,0). B 
ik«n-1 
Y = bk,k+1; Z = bk,k+2 


end 
end 


An efficient implementation of this algorithm would store B’s diagonal and 
superdiagonal in vectors a(1:n) and f(1:n — 1) respectively and would re- 
quire 30n flops and 2n square roots. Accumulating U requires 6mn flops. 
Accumulating V requires 6n? flops. 

Typically, after a few of the above SVD iterations, the superdiagonal 
entry fn-ı becomes negligible. Criteria for smallness within B's band are 
usually of the form 


Kl < edi + Ideal) 
€ e| Bl 


where e is a small multiple of the unit roundoff and || - || is some compu- 
tationally convenient norm. 

Combining Algorithm 5.4.2 (bidiagonalization), Algorithm 8.6.1, and 
the decoupling calculations mentioned earlier gives 


Algorithm 8.6.2 (The SVD Algorithm) Given A € IR?*" (m > n) and 
€, a small multiple of the unit roundoff, the following algorithm overwrites 
A with UT AV = D + E, where U € IR"*" is orthogonal, V € IR?*" is 
orthogonal, D € IR"** is diagonal, and E satisfies || E ||; ~ ull A |l2. 


Use Algorithm 5.4.2 to compute the bidiagonalization 


| B | c (Ui Un) TA(Vi --- Va-2) 
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until g =n 
Set bi, i,1 to zero if |bicsi| € e([bi| + lbi+1,+1l) 
for any i = l:n — 1. 
Find the largest q and the smallest p such that if 


B- 0 Bog 0 n-p-q 
0 0 B33 q 


p n—p-—aq q 


then B33 is diagonal and B22 has nonzero superdiagonal. 
ifq«n 
if any diagonal entry in B22 is zero, then zero 
the superdiagonal entry in the same row. 
else 
Apply Algorithm 8.6.1 to B22, 
B = diag(Ip, U, I4, 5)? Bdiag(Ip, V, Iq) 
end 
end 
end 


The amount of work required by this algorithm and its numerical properties 
are discussed in 85.4.5 and 85.5.8. 


Example 8.6.5 If Algorithm 8.6.2 is applied to 


A= 


Re OO 


1 1 
0 2 
0 0 
0 0 
then the superdiagonal elements converge to zero as follows: 


Iteration — O(laz1) O(la32|) — O(laasl) 


1 107 107 10° 
2 10° 10° 10° 
3 10° 10° 10° 
4 10° 107! 107? 
5 10° 107! 10-8 
6 109 1071 10727 
7 10° 107! converg. 
8 10° 10-4 
9 1071 10-14 

10 107!  converg. 

11 1074 

12 10-12 


13  converg. 


Observe the cubic-like convergence. 
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8.6.3 Jacobi SVD Procedures 


It is straightforward to adapt the Jacobi procedures of §8.4 to the SVD 
problem. Instead of solving a sequence of 2-by-2 symmetric eigenproblems, 
we solve a sequence of 2-by-2 SVD problems. Thus, for a given index pair 
(p,q) we compute a pair of rotations such that 


| e a l'on om | c2 2]-[4 | 

-s1 € üqp Aqq | -s2 & | 0 dg |` 

See P8.6.8. The resulting algorithm is referred to as two-sided because each 
update involves a pre- and post-multiplication. 

A one-sided Jacobi algorithm involves a sequence of pairwise column 
orthogonalizations. For a given index pair (p,q) a Jacobi rotation J(p, q, 9) 
is determined so that columns p and q of AJ(p.q, 0) are orthogonal to each 
other. See P8.6.8. Note that this corresponds to zeroing the (p, q) and (q, p) 
entries in AT A. Once AV has sufficiently orthogonal columns, the rest of 
the SVD (U and E) follows from column scaling: AV = UX. 


Problems 


P8.6.1 Show that if B € R”*” is an upper bidiagonal matrix having a repeated singular 
value, then B must have a zero on its diagonal or superdiagonal. 
0 AT 


P8.6.2 Give formulae for the eigenvectors of | A 0 


in terms of the singular 


vectors of A € R™*” where m > n. 


P8.6.3 Give an algorithm for reducing a complex matrix A to real bidiagonal form 
using complex Householder transformations. 


P8.6.4 Relate the singular values and vectors of A = B + iC (B, C € R™*”) to those 
of B -C 

C B j|’ 
P8.6.5 Complete the proof of Theorem 8.6.1. 
P8.6.6 Assume that n = 2m and that S € R”*” is skew-symmetric and tridiagonal. 
Show that there exists a permutation P € R” *” such that PT SP has the following form: 
0 —BT m 
B 0 m . 
m m 


PTSP = | 


Describe B. Show how to compute the eigenvalues and eigenvectors of S via the SVD 
of B. Repeat for the case n = 2m + 1. 


P8.6.7 (a) Let 
o- [11] 
y z 


be real. Give a stable algorithm for computing c and s with c? + s? = 1 such that 


B=] c s |e 
—8 cC 


is symmetric. (b) Combine (a) with the Jacobi trigonometric calculations in the text 
to obtain a stable algorithm for computing the SVD of C. (c) Part (b) can be used to 
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develop a Jacobi-like algorithm for computing the SVD of A € R™*". For a given (p,q) 
with p < q, Jacobi transformations J(p, q,01) and J(p, q, 02) are determined such that if 


B = J(p,q,01)7 AJ(p, q, 02), 
then by; = bgp = 0. Show 


off(B)? = off(A)? — t2, — 2. 


How might p and q be determined? How could the algorithm be adapted to handle the 
case when A € R™*" with m > n? 


P8.6.8 Let z and y bein R™ and define the orthogonal matrix Q by 


e= [a] 


Give a stable algorithm for computing c and s such that the columns of [z, y]Q are or- 
thogonal to each other. 


P8.6.9 Suppose B € R”*” is upper bidiagonal with bnn = 0. Show how to construct 
orthogonal U and V (product of Givens rotations) so that UT BV is upper bidiagonal 
with a zero nth column. 


P8.6.10 Suppose B € R”*” is upper bidiagonal with diagonal entries d(1:n) and super- 
diagonal entries f(1:n — 1). State and prove a singular value version of Theorem 8.5.1. 
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8.7 Some Generalized Eigenvalue Problems 


Given a symmetric matrix A € IR**" and a symmetric positive definite 
B € R"*", we consider the problem of finding a nonzero vector x and a 
scalar à so Ax = ABz. This is the symmetric-definite generalized eigen- 
problem. The scalar À can be thought of as a generalized eigenvalue. As A 
varies, A — AB defines a pencil and our job is to determine 


X(A,B) = (A |det(A— AB) = 0}. 


A symmetric-definite generalized eigenproblem can be transformed to an 
equivalent problem with a congruence transformation: 


A-—ABissingdar «€  (XTAX) — A(XTBX) is singular 


Thus, if X is nonsingular, then (A,B) = A(XTAX, XT BX). 

In this section we present various structure-preserving procedures that 
solve such eigenproblems through the careful selection of X. The related 
generalized singular value decomposition problem is also discussed. 


8.7.1 Mathematical Background 


We seek is a stable, efficient algorithm that computes X such that XT AX 
and XT BX are both in “canonical form." The obvious form to aim for is 
diagonal form. 


Theorem 8.7.1 Suppose A and B are n-byn symmetric matrices, and 


define C(u) by 
C(u) = nA * (1— u)B HER. (8.7.1) 
If there exists a u € [0,1] such that C(u) is non-negative definite and 
null(C(u)) = null(A) n null(B) 


then there exists a nonsingular X such that both XT AX and X™BX are 
diagonal. 


Proof. Let u € [0,1] be chosen so that C(j1) is non-negative definite with 
the property that null(C(j)) = null(A) n null(B). Let 


D 0 . 
QTC(u)Qi = | 0 0 | D = diag(d,,...,dx), di > 0 


be the Schur decomposition of C(u) and define X; — Qidiag(D~!/?, In—p). 
If Ay = XP AX, Bı = XTBX,, and Ci = XIC(yu)X,, then 


I. 0 
C, = | : o | = pA (1 — n)Bs. 
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Since span(ey,1,...,e4) = null(Ci) = null(A;)Mnull(B)) it follows that 
A, and B, have the following block structure: 


4, [4n 0 k _ [Bu 0 k 
A= [^e o | sts a= |% o | sts 
k n-k k n—k 


Moreover I, = pA, + (1 — pw) By. 
Suppose u Æ 0. It then follows that if Z7B,,Z = diag(b,,..., bx) is 
the Schur decomposition of Bıı and we set X = Xjdiag(Z, I4, .X) then 


XTBX = diag(b;,...,b,,0,...,0) = Dp 
and 


XTAX = TXT (OW) - (.— B) X 


- «(5 o |-0-9D»)= Da. 


On the other hand, if u = 0, then let ZT Aj,Z = diag(a,,...,a,) be the 
Schur decomposition of A,,; and set X = Xjdiag(Z, Is X). It is easy to 
verify that in this case as well, both XT AX and XT BX are diagonal. O 


Frequently, the conditions in Theorem 8.7.1 are satisfied because either A 
or B is positive definite. 


Corollary 8.7.2 If A— AB € IR"*" is symmetric-definite, then there ex- 
ists a nonsingular X =[21,...,2n] such that 


XTAX = diag(a,,...,@n) and XTBX = diag(by,...,bn). 
Moreover, Ax; = AjBz; for i =1:n where A, = a; /bi. 


Proof. By setting 4 = 0 in Theorem 8.7.1 we see that symmetric-definite 
pencils can be simultaneously diagonalized. The rest of the corollary is 
easily verified. O 


Example 8.7.1 If 


A= | 229 163 


163 116 


81 59 
and B= [55 & | 


then A — AB is symmetric-definite and A(A, B) = (5, —1/2). If 


x-[4 7] 
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then XT AX = diag(5,—1) and XT BX = diag(1, 2). 


Stewart (1979) has worked out a perturbation theory for symmetric 
pencils A — AB that satisfy 


c(A4,B) = min (27 Ar)? + (x7 Bx)? > 0 (8.7.2) 
llla 


The scalar c(A, B) is called the Crawford number of the pencil A — AB. 


Theorem 8.7.3 Suppose A — AB is an n-by-n symmetric-definite pencil 
with eigenvalues 
Ay 2 Ag 2 +++ 2 An. 


Suppose Ea and Eg are symmetric n-by-n matrices that satisfy 
e? = | Ea |? +| Es |} < c(4.B). 
Then (A+ E4) — A(B -- Ep) is symmetric-definite with eigenvalues 
fy 2o > qun 
that satisfy 
larctan(A;) — arctan(u;)| € arctan(e/c(A, B)) 
for i -— ln. 


Proof. See Stewart (1979). O 


8.7.2 Methods for the Symmetric-Definite Problem 


Turning to algorithmic matters, we first present a method for solving the 
symmetric-definite problem that utilizes both the Cholesky factorization 
and the symmetric QR algorithm. 


Algorithm 8.7.1 Given A= AT € R"*" and B = BT c R"*" with B 
positive definite, the following algorithm computes a nonsingular X such 
that XTBX- I, and XT AX = diag(ay,..., an). 


Compute the Cholesky factorization B — GGT 
using Algorithm 4.2.2. 

Compute C = G-!AG-T, 

Use the symmetric QR algorithm to compute the Schur 
decomposition QT CQ = diag(aj,..., an). 

Set X =G TQ. 
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This algorithm requires about 14n? flops. In a practical implementation, 
A can be overwritten by the matrix C. See Martin and Wilkinson (1968c) 
for details. Note that 


A(A, B) = (A, GGT) = A(G^ 1 AGT, I) = A(C) = {a1,... an}. 


If à; is a computed eigenvalue obtained by Algorithm 8.7.1, then it can 
be shown that à; € \(G-!AG-T + E;), where || E; ||2 = ull A [ll| Bo? |l2. 
Thus, if B is ill-conditioned, then à; may be severely contaminated with 
roundoff error even if a; is a well-conditioned generalized eigenvalue. The 
problem, of course, is that in this case, the matrix C = G^! AG-T can have 
some very large entries if B, and hence G, is ill-conditioned. This difficulty 
can sometimes be overcome by replacing the matrix G in Algorithm 8.7.1 
with VD-!/? where VT BV = D is the Schur decomposition of B. If the 
diagonal entries of D are ordered from smallest to largest, then the large 
entries in C are concentrated in the upper left-hand corner. The small 
eigenvalues of C can then be computed without excessive roundoff error 
contamination (or so the heuristic goes). For further discussion, consult 
Wilkinson (1965, pp.337—38). 


Example 8.7.2 If 


T 


and B = GG’, then the two smallest eigenvalues of A — AB are 
a, = —0.619402940600584 a2 = 1.627440079051887. 


If 17-digit floating point arithmetic is used, then these eigenvalues are computed to full 
machine precision when the symmetric QR algorithm is applied to fI(D-1/2VT AVD~ 1/2 
where B = VDVT is the Schur decomposition of B. On the other hand, if Algorithm 
8.7.1 is applied, then 


âı = —0.619373517376444 â2 = 1.627516601905228. 


WN re 
CUu cr 
[^ orc 


.001 0 0 
and G — 1 001 0 


2 1 -001 


The reason for obtaining only four correct significant digits is that x2(B) + 1018. 


The condition of the matrix X in Algorithm 8.7.1 can sometimes be 
improved by replacing B with a suitable convex combination of A and B. 
The connection between the eigenvalues of the modified pencil and those 
of the original are detailed in the proof of Theorem 8.7.1. 

Other difficulties concerning Algorithm 8.7.1 revolve around the fact 
that G-! AGT is generally full even when A and B are sparse. This is a 
serious problem, since many of the symmetric-definite problems arising in 
practice are large and sparse. 

Crawford (1973) has shown how to implement Algorithm 8.7.1 effec- 
tively when A and B are banded. Aside from this case, however, the si- 
multaneous diagonalization approach is impractical for the large, sparse 
symmetric-definite problem. 
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An alternative idea is to extend the Rayleigh quotient iteration (8.4.4) 
as follows: 


zo given with || zo || = 1 

for k = 0,1,... 
Uk = xf Arg/xt Br, (8.7.3) 
Solve (A — u&B)zy41 = Bzy for zg41- 
Tk] = Zk41/ ll Zk41 lla 

end 


The mathematical basis for this iteration is that 


zT Ax 
A2 LL .7.4 
aT Br (8.7.4) 
minimizes 
f(A) = || Az — ABz ||, (8.7.5) 


where ||-|[p is defined by ||z||?, = zT B-!z. The mathematical properties of 


(8.7.3) are similar to those of (8.4.4). Its applicability depends on whether 
or not systems of the form (A — 4B)z = x can be readily solved. A similar 
comment pertains to the following generalized orthogonal iteration: 


Qo € R°” given with QI Qo = Ip 

for k = 1,2,... 
Solve BZ, = AQk-1 for Zk. (8.7.6) 
Zk = QeRe (QR factorization) 

end 


This is mathematically equivalent to (7.3.4) with A replaced by B^ A. Its 
practicality depends on how easy it is to solve linear systems of the form 
Bz =y. 

Sometimes A and B are so large that neither (8.7.3) nor (8.7.6) can be 
invoked. In this situation, one can resort to any of a number of gradient 
and coordinate relaxation algorithms. See Stewart (1976) for an extensive 
guide to the literature. 


8.7.3 The Generalized Singular Value Problem 


We conclude with some remarks about symmetric pencils that have the 
form AT A — ABT B where A € IR™*” and B € IR", This pencil under- 
lies the generalized singular value decomposition (GSVD), a decomposition 
that is useful in several constrained least squares problems. (Cf. 812.1.) 
Note that by Theorem 8.7.1 there exists a nonsingular X € IR"*" such that 
XT(AT A)X and XT(BTB)X are both diagonal. The value of the GSVD 
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is that these diagonalizations can be achieved without forming ATA and 
BTB. 


Theorem 8.7.4 (Generalized Singular Value Decomposition) If we 
have A € IR™*” with m > n and B € IP*^, then there exist orthogonal 
U c R™*™ and V € IR*? and an invertible X € IR"*" such that 


UTAX =C diag(ci,..., Cn) ci 20 


and 
VTBX = 


| 
[^] 
II 


diag(s,..., Sq) si> 0 


where q = min(p, n). 


Proof. The proof of this decomposition appears in Van Loan (1976). We 
present a more constructive proof along the lines of Paige and Saunders 
(1981). For clarity we assume that null(A) N null(B) = {0} and p > n. We 
leave it to the reader to extend the proof so that it covers theses cases. 


Let 
H - [à [n (8.7.6) 


be a QR factorization with Q; € IR™*", Q2 € IRP*", and R € IR"*". Paige 
and Saunders show that the SVD's of Qı and Q» are related in the sense 
that 

Qi = UCWT Q = VSWT (8.7.7) 


Here, U,V, and W are orthogonal, C = diag(ci) with 0 € c1 € --- € Cn, S 
= diag(s;) with s; > +- > sn, and CTC + STS = Ip. The decomposition 
(8.7.7) is a variant of the CS decomposition in §2.6 and from it we conclude 
that A=Q,R = UC(WTR) and B=Q2R = VS(WTR). The theorem 
follows by setting X = (W7R)-!, Da = C, and Dg = S . The invertibility 
of R follows from our assumption that null(A) A null(B) = {0}. à 


The elements of the set o(A, B) = {c1/81,...,¢n/Sq ) are referred 
to as the generalized singular values of A and B. Note that o € o(A, B) 
implies that c? € A(AT A, BT B). The theorem is a generalization of the 
SVD in that if B = In, then o(A, B) = a(A). 

Our proof of the GSVD is of practical importance since Stewart (1983) 
and Van Loan (1985) have shown how to stably compute the CS decompo- 
sition. The only tricky part is the inversion of WT R to get X. Note that 
the columns of X = [21,...,2n] satisfy 


s2AT Ar, = ÊBT Br; i=1:n 


and so if s; Æ 0 then AT Az; = c2BT Bz; where c; = c;/s;. Thus, the zi 
are aptly termed the generalized singular vectors of the pair (A, B). 
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In several applications an orthonormal basis for some designated gen- 
eralized singular vector subspace space span{z;,,---, 24] is required. We 
show how this can be accomplished without any matrix inversions or cross 
products: 


e Compute the QR factorization 


[2] - [3] 


e Compute the CS decomposition 
Qi = UCWT Q = VSWT 
and order the diagonals of C and S so that 
{c1/81,-..,Ck/sk } = (6u] sis... Ca, / su) 


e Compute orthogonal Z and upper triangular T so TZ = WTR. (See 
P8.7.5.) Note that if X-! = WTR = TZ, then X = ZTT-! and so 
the first k rows of Z are an orthonormal basis for span(zi,..., £k}. 


Problems 


P8.7.1 Suppose A € R?*" is symmetric and G € R?*" is lower triangular and nonsin- 
gular. Give an efficient algorithm for computing C = G-1AG-T . 

P8.7.2 Suppose A € R*** is symmetric and B € R?*" is symmetric positive definite. 
Give an algorithm for computing the eigenvalues of AB that uses the Cholesky factor- 
ization and the symmetric QR algorithm. 


P8.7.3 Show that if C is real and diagonalizable, then there exist symmetric matrices A 
and B, B nonsingular, such that C = AB~!. This shows that symmetric pencils A— AB 
are essentially general. 


P8.7.4 Show how to convert an Az = ABz problem into a generalized singular value 
problem if A and B are both symmetric and non-negative definite. 


P8.7.5 Given Y c R”*” show how to compute Householder matrices H2,...,Hn so 
that Y Hn ---H2 = T is upper triangular. Hint: Hy zeros out the kth row. 


P8.7.6 Suppose 
0 A “l-3 Bı 0 y 
AT 0 z ~ 0 Be z 


where A c R?X", B, € R™*™, and B; c R” X”. Assume that Bı and B» are positive 
definite with Cholesky triangles G1 and G2 respectively. Relate the generalized eigen- 
values of this problem to the singular values of Gy! AG; T 


P8.7.7 Suppose A and B are both symmetric positive definite. Show how to compute 
A(A, B) and the corresponding eigenvectors using the Cholesky factorization and CS 
decomposition. 


Notes and References for Sec. 8.7 
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Chapter 9 


Lanczos Methods 


$9.1 Derivation and Convergence Properties 
§9.2 Practical Lanczos Procedures 

$9.3 Applications to Az = b and Least Squares 
§9.4 Arnoldi and Unsymmetric Lanczos 


In this chapter we develop the Lanczos method, a technique that can be 
used to solve certain large, sparse, symmetric eigenproblems Ax = Ax. The 
method involves partial tridiagonalizations of the given matrix A. How- 
ever, unlike the Householder approach, no intermediate, full submatrices 
are generated. Equally important, information about A’s extremal eigen- 
values tends to emerge long before the tridiagonalization is complete. This 
makes the Lanczos algorithm particularly useful in situations where a few 
of A’s largest or smallest eigenvalues are desired. 

The derivation and exact arithmetic attributes of the method are pre- 
sented in §9.1. The key aspects of the Kaniel-Paige theory are detailed. 
This theory explains the extraordinary convergence properties of the Lanc- 
zos process. Unfortunately, roundoff errors make the Lanczos method some- 
what difficult to use in practice. The central problem is a loss of orthog- 
onality among the Lanczos vectors that the iteration produces. There are 
several ways to cope with this as we discuss §9.2. 

In §9.3 we show how the “Lanczos idea” can be applied to solve an as- 
sortment of singular value, least squares, and linear equations problems. Of 
particular interest is the development of the conjugate gradient method for 
symmetric positive definite linear systems. The Lanczos-conjugate gradient 
connection is explored further in the next chapter. In §9.4 we discuss the 
Arnoldi iteration which is based on the Hessenberg decomposition and a 
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version of the Lanczos process that can (sometimes) be used to tridiago- 
nalize unsymmetric matrices. 


Before You Begin 


Chapters 5 and 8 are required for §9.1-9.3 and Chapter 7 is needed for 
89.4. Within this chapter there are the following dependencies: 


§91 — $892 — $93 


l 
89.4 


A wide range of Lanczos papers are collected in Brown, Chu, Ellison, and 
Plemmons (1994). Other complementary references include Parlett (1980), 
Saad (1992), and Chatelin (1993). The two volume work by Cullum and 
Willoughby (1985a,1985b) includes both analysis and software. 


9.1 Derivation and Convergence Properties 


Suppose A € IR?*** is large, sparse, and symmetric and assume that a few 
of its largest and/or smallest eigenvalues are desired. This problem can be 
solved by a method attributed to Lanczos (1950). The method generates 
a sequence of tridiagonal matrices Tẹ with the property that the extremal 
eigenvalues of Tẹ € IR*** are progressively better estimates of A’s extremal 
eigenvalues. In this section, we derive the technique and investigate its 
exact arithmetic properties. Throughout the section A;(-) designates the 
ith largest eigenvalue. 


9.1.1 Krylov Subspaces 


The derivation of the Lanczos algorithm can proceed in several ways. So 
that its remarkable convergence properties do not come as a complete sur- 
prise, we prefer to lead into the technique by considering the optimization 
of the Rayleigh quotient 


r(r) = —— r£zÉ0. 


Recall from Theorem 8.1.2 that the maximum and minimum values of r(x) 
are 4;(A) and A,(A), respectively. Suppose (q;) C IR” is a sequence of 
orthonormal vectors and define the scalars My and m, by 


T T 
My = A (QT AQ) = y (Qi AQi)V. _ 
k = AY(Q& AQx) max yy mex r(Qxy) € Xi(A) 
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T 
my = (Qi AQx) = min V (GE AQ = min r(Qxy) > A«(A) 
yz y y lyll2=1 


where Qk = [qi,...,qx]. The Lanczos algorithm can be derived by con- 
sidering how to generate the qx so that Mi and m, are increasingly better 
estimates of A1 (A) and A4(A). 

Suppose uy € span(qi,...,qk) is such that My = r(uy). Since r(x) 
increases most rapidly in the direction of the gradient 


2 
V = ——(Ar- 
r(z) zr T r(z)z), 
we can ensure that My, > Mk if qk+1 is determined so 


Vr(ux) € span{qi,.-.,@e41}- (9.1.1) 


(This assumes Vr(uy) # 0.) Likewise, if v € span(qi,...,qx) satisfies 
r(vy) = mx, then it makes sense to require 


Vr(vk) € span{q,.-.,de+1} (9.1.2) 


since r(x) decreases most rapidly in the direction of — Vr(x). 

At first glance, the task of finding a single q,41 that satisfies these two 
requirements appears impossible. However, since Vr(r) € span(z, Az}, it 
is clear that (9.1.1) and (9.1.2) can be simultaneously satisfied if 


span(q,...,qk) = span(a, Aq, ..., A !q) 
and we choose qk+1 SO 
span(qi, ttg dk+1} = span{qı, Aq, -g Ak-1q,, A¥q}. 


Thus, we are led to the problem of computing orthonormal bases for the 
Krylov subspaces 


K(A,q,k) = span{q:, Aqi,..-,A*—1q1}. 
These are just the range spaces of the Krylov matrices 
K(A,qn) = [a Ag, A2a,..., Alq ]. 
presented in 88.3.2. 


9.1.2  Tridiagonalization 


In order to find this basis efficiently we exploit the connection between the 
tridiagonalization of A and the QR factorization of K (A4,q1, n). Recall that 
if QT AQ = T is tridiagonal with Qe, = q1, then 


K(A.q,n) = Q[e1, Tei, T?e1,..., T^7161] 


9.1. DERIVATION AND CONVERGENCE PROPERTIES 473 


is the QR factorization of K(A,q1, n) where e = I4,(:, 1). Thus the gą can 
effectively be generated by tridiagonalizing A with an orthogonal matrix 
whose first column is q1. 

Householder tridiagonalization, discussed in 88.3.1, can be adapted for 
this purpose. However, this approach is impractical if A is large and sparse 
because Householder similarity transformations tend to destroy sparsity. 
As a result, unacceptably large, dense matrices arise during the reduction. 

Loss of sparsity can sometimes be controlled by using Givens rather 
than Householder transformations. See Duff and Reid (1976). However, 
any method that computes T' by successively updating A is not useful in 
the majority of cases when A is sparse. 

This suggests that we try to compute the elements of the tridiagonal 
matrix T = QT AQ directly. Setting Q = [q1,..., qn ] and 


a f Ue 0 
Bo ` : 
T = 
: e E Bn-1 
0 UU. Bn-1 On 


and equating columns in AQ = QT, we find 


Aqk = Bk-19k-1 + ORG + Dkqk41 8090 =0 


for k = 1:n — 1. The orthonormality of the q; implies a, = qf Aqx. 
Moreover, if ry = (A — oxI)qx — Bk 1qk-1 is nonzero, then qk41 = re / BE 
where 8k = +|l rx |lo. If rg = 0, then the iteration breaks down but (as 
we shall see) not without the acquisition of valuable invariant subspace 
information. So by properly sequencing the above formulae we obtain the 
Lanczos iteration: 


Tro = 91; fo = 1; go = 0; k —0 

while (B £ 0) 
Qk41-— Tk / Bk; k-kctloyj- qi Aqk (9.1.3) 
rk = (A — okI)ak — Bk-idk-ii Be = || rk lle 

end 


There is no loss of generality in choosing the 8p to be positive. The gq, are 
called Lanczos vectors. 


9.1.3 Termination and Error Bounds 


The iteration halts before complete tridiagonalization if qı is contained in 
a proper invariant subspace. This is one of several mathematical properties 
of the method that we summarize in the following theorem. 
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Theorem 9.1.1 Let A € IR"** be symmetric and assume qı € IR" has unit 
2-norm. Then the Lanczos iteration (9.1.3) runs until k = m, where m = 
rank(K (A, q1, n)) Moreover, for k = 1:m we have 


AQk = QkTk + reed (9.1.4) 
where 
o1 £i EE 0 
Bi ae : 
Tk = 
: ` Bk-1 
0 € Bk-1 Qk 


and Qk = [| q1, ---,qk | has orthonormal columns that span K(A, q1, k). 


Proof. The proof is by induction on k. Suppose the iteration has produced 
Qk = [qu...,q] such that ran(Q,) = K(A,qi, k) and QLQx = Ix. It is 
easy to see from (9.1.3) that (9.1.4) holds. Thus, Q7AQ, = Tk - Qi ret. 
Since a; = gf Aq; for i = 1:k and 


qiqi = qLi(Aai ~ aiqi — Bi-1gi-1) = ahi (Bigit1) = Bi 


for i = 1:k — 1, we have QT AQ, = Tk. Consequently, Q7r;, = 0. 
If ry Æ 0, then gx41 = rx/|| rx |l2 is orthogonal to q1,..., gk and 


Qk+1 € span(Aqx, gk, qk-1) € K(A, qi, K+ 1). 


Thus, QT. Qk+1 = Ik+1 and ran(Qg41) = K(A,qi,k + 1). On the other 
hand, if ry = 0, then AQ, = QkTk. This says that ran(Q,) = K(A, qi, k) 
is invariant. From this we conclude that k = m = rank(K(A,q1, n)). 0 


Encountering a zero 0 in the Lanczos iteration is a welcome event in that it 
signals the computation of an exact invariant subspace. However, an exact 
zero or even a small £y is a rarity in practice. Nevertheless, the extremal 
eigenvalues of Tẹ turn out to be surprisingly good approximations to A’s 
extremal eigenvalues. Consequently, other explanations for the convergence 
of Ty's eigenvalues must be sought. The following result is a step in this 
direction. 


Theorem 9.1.2 Suppose that k steps of the Lanczos algorithm have been 
performed and that SLT&Sy = diag(01,...,9%) is the Schur decomposition 
of the tridiagonal matriz Ty. If Y = [yvi,..., yk] = QkSy € IR?**, then 
for i =1:k we have || Ayi — biyi lla = || Iski] where Sk = (Spa). 


Proof. Post-multiplying (9.1.4) by S, gives 
AY, = Y,diag(6:,..., 0k) + rkef Sk, 
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and so Ay; = 6:4; + rk(e1 Se;). The proof is complete by taking norms 
and recalling that || rx |l2 = |8x|. 0 


The theorem provides computable error bounds for T;,’s eigenvalues: 


min |6;—p| < [Gel Iski] i= Lk 
BEANA) 


Note that in the terminology of Theorem 8.1.15, the (0j, y;) are Ritz pairs 
for the subspace ran(Qx). 

Another way that T4 can be used to provide estimates of A's eigenvalues 
is described in Golub (1974) and involves the judicious construction of a 
rank-one matrix E such that ran(Q,) is invariant for A+ E. In particular, 
if we use the Lanczos method to compute AQ, = QkTk + The, and set E 
= rww, where r = +1 and w = aqk + brg, then it can be shown that 


(A+ E)Qy = Qk(Tk + Taeke, ) + (1+ rab)rye. 


If 0 = 1 + rab, then the eigenvalues of T, = Tk + Taeke, a tridiagonal 
matrix, are also eigenvalues of A+ E. Using Theorem 8.1.8 it can be shown 
that the interval [A;(TX), 4; 1(74)] contains an eigenvalue of A for i = 2:k. 

These bracketing intervals depend on the choice of ra?. Suppose we 
have an approximate eigenvalue of A of A. One possibility is to choose 
Ta? so that det(T, — AI) = (oa + ra? — r)pe—i(A) — 82. ,px-2() = 0 
where the polynomials p;(r) = det(T; — x1;) can be evaluated at A using the 
three-term recurrence (8.5.2). (This assumes that py. 1(A) Æ 0.) Eigenvalue 
estimation in this spirit is discussed in Lehmann (1963) and Householder 
(1968). 


9.1.4 The Kaniel-Paige Convergence Theory 


The preceding discussion indicates how eigenvalue estimates can be ob- 
tained via the Lanczos algorithm, but it reveals nothing about rate of con- 
vergence. Results of this variety constitute what is known as the Kaniel- 
Paige theory, a sample of which follows. 


Theorem 9.1.3 Let A be an n-by-n symmetric matriz with eigenvalues 
Ay > +++ > Ay and corresponding orthonormal eigenvectors z1,...,24. If 
01 > --- > Oy are the eigenvalues of the matriz Tk obtained after k steps of 
the Lanczos iteration, then 


Qu = An) tan(o1)? 
(ck-1(1 + 2p1))? 


where cos(¢i) = laf zl, 91 = (Ar — A2)/(A2 — An), and ey-i(x) is the 
Chebyshev polynomial of degree k — 1. 


Àài20 2A — 
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Proof. From Theorem 8.1.2, we have 


T T T 
6,— max IY mas (QR) AQ) yay ow Av 


vž0 oy veo (QY (Qey) of weX(A,ai,k) Ww 


Since A, is the maximum of wT Aw/w7 w over all nonzero w, it follows that 
A1 > 8. To obtain the lower bound for 9), note that 


T 
6, = max 2 P(A)Ap(A)an 
PEP k-1 qi P(A)?q1 


n 
where Pp-1 is the set of k — 1 degree polynomials. If q) = D diz; then 
i=l 


3o dip! 
qi p(A) Ap(A)q _ isl ' 
T 2 m n 
qi P(A)? 
i 3 dip(X? 
i=l 
35 dep)? 
> Ar- (Ar — An) = 


<. 
dip)! + J dip)? 

We can make the lower bound tight by selecting a polynomial p(x) that is 
large at x = Àj in comparison to its value at the remaining eigenvalues. 
One way of doing this is to set 


t—An 
p(x) = exa (-: t A) 


where cy 1(z) is the (k — 1)-st Chebyshev polynomial generated via the 
recursion 


ck(z) = 2zek-1(z) —ek-2(2) | co —1, ei =z. 


These polynomials are bounded by unity on [~1,1], but grow very rapidly 
outside this interval. By defining p(r) this way it follows that |p(A;)| is 
bounded by unity for i = 2:n, while p(A1) = cx-1(1-- 201). Thus, 
1-d? 1 

di cy-1(1 +21)? © 


The desired lower bound is obtained by noting that tan($1)? = (1—4?) /d?. B 


0 > àr- (Ar — An) 


An analogous result pertaining to 0; follows immediately from this theorem: 
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Corollary 9.1.4 Using the same notation as the theorem, 


Qu = An) tan(on)? 


An € Ok € Ant 
i exi + 294)? 


where p, = (An-1 = An)/O1 = àn-1) and cos(Pn) = ql zs. 


Proof. Apply Theorem 9.1.3 with A replaced by — A. O 


9.1.5 The Power Method Versus the Lanczos Method 


It is worthwhile to compare 01 with the corresponding power method esti- 
mate of A1. (See 88.2.1.) For clarity, assume A; > --- > An > 0. After k—1 
power method steps applied to q1, a vector is obtained in the direction of 


n 
v = Alq = Xo ap lz 
i=l 
along with an eigenvalue estimate 


vT Av 
n= 


uly 


Using the proof and notation of Theorem 9.1.3, it is easy to show that 


d 2k-1 
à 2 71 2 à — (à — An) tan($i)? (52) . (9.1.5) 
1 


(Hint: Set p(x) = z*-! in the proof.) Thus, we can compare the quality of 
the lower bounds for 04 and yı by comparing 


Ly-1 = 1/ a- (3 -)| 2 1/ lex-11 + 200)? 


A 2(k-1) 
Rr = (32) . 


This is done in following table for representative values of k and A2/A1. 

The superiority of the Lanczos estimate is self-evident. This should 
be no surprise, since 0; is the maximum of r(x) = zT Az/zT x over all of 
K(A, qi, k), while yı = r(v) for a particular v in K(A,q,,k), namely v = 
A ~ qi. 


and 
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Xa k=5 k-i0 k=i15  k-20 k=2 


1.50 1.1x10-* 2.0x10- 19 3.9x10- 18 74x10-7? 1.4x 10727 
` 3.9x 10-2 6.8x 10-4 1.2x 10-5 2.0x 10-7 3.5x 10-9 
1.10 2.7x 1072 5.5x1075 1.11077 2.1x10- 1° 4.2x 10713 
` 4.7x10-! 1.8x10-! 6.9x10-2 2.7x 10-2 1.0x 10-2 
1.01 5.6x10-! 1.0x107! 1.5x 107? 2.0x 1073 2.8x 1074 
` 9.2x10-! 8.4x 10-1 7.6x 10-1 6.9x 10-3 6.2x 10-1 


TABLE 9.1.1 Li 1/Ry.1 


9.1.6 Convergence of Interior Eigenvalues 


We conclude with some remarks about error bounds for T's interior eigen- 
values. The key idea in the proof of Theorem 9.1.3 is the use of the trans- 
lated Chebyshev polynomial. With this polynomial we amplified the com- 
ponent of qı in the direction z,. A similar idea can be used to obtain bounds 
for an interior Ritz value 0;. However, the bounds are not as satisfactory be- 
cause the "amplifying polynomial" has the form q(x) T=} (x — Aj) , where 
q(x) is the (k — 1) degree of the Chebyshev polynomial on the interval 
[Ai+1, An]. For details, see Kaniel (1966), Paige (1971), or Saad (1980). 


Problems 


P9.1.1 Suppose A € R?** is skew-symmetric. Derive a Lanczos-like algorithm for 
computing a skew-symmetric tridiagonal matrix Tm such that AQm = QmTm, where 
QT Qm = Im. 

P9.1.2 Let A c R?X" be symmetric and define r(z) = zTAz/zTz. Suppose S C R^ 
is a subspace with the property that z € S implies Vr(z) € S. Show that S is invariant 
for A. 

P9.1.3 Show that if a symmetric matrix A c R”X” has a multiple eigenvalue, then the 
Lanczos iteration terminates prematurely. 

P9.1.4 Show that the index m in Theorem 9.1.1 is the dimension of the smallest in- 
variant subspace for A that contains q1. 


P9.1.5 Let A € R"X" be symmetric and consider the problem of determining an or- 


thonormal sequence q1, q2,... with the property that once Qk =[q1,...,q%] is known, 
dk41 is chosen so as to minimize pe =  ||(/ — Qu41QT, ,)AQx lp- Show that if 


span(gi,...,qx) = K(A, q1, k), then it is possible to choose qx41 so pp = 0. Explain 
how this optimization problem leads to the Lanczos iteration. 


P9.1.6 Suppose A € R* X" is symmetric and that we wish to compute its largest eigen- 
value. Let 7 be an approximate eigenvector and set 


nT An 
ntn 
z = A-a. 
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(a) Show that the interval [a — œ + 6] must contain an eigenvalue of A where 6 = 
|| z ll2/ll 7 2. (b) Consider the new approximation 7j = an + bz and show how to deter- 
mine the scalars a and b so that 


-T7 
Ton 

is maximized. (c) Relate the above computations to the first two steps of the Lanczos 

process. 


Notes and References for Sec. 9.1 


The classic reference for the Lanczos method is 


C. Lanczos (1950). “An Iteration Method for the Solution of the Eigenvalue Problem of 
Linear Differential and Integral Operators,” J. Res. Nat. Bur. Stand. 45, 255-82. 


Although the convergence of the Ritz values is alluded to this paper, for more details we 
refer the reader to 


S. Kaniel (1966). “Estimates for Some Computational Techniques in Linear Algebra,” 
Math. Comp. 20, 369-78. 

C.C. Paige (1971). “The Computation of Eigenvalues and Eigenvectors of Very Large 
Sparse Matrices,” Ph.D. thesis, London University. 

Y. Saad (1980). “On the Rates of Convergence of the Lanczos and the Block Lanczos 
Methods,” SIAM J. Num. Anal.17, 687—706. 


The connections between the Lanczos algorithm, orthogonal polynomials, and the theory 
of moments are discussed in 


N.J. Lehmann (1963). “Optimale Eigenwerteinschliessungen,” Numer. Math. 5, 246-72. 

A.S. Householder (1968). “Moments and characteristic Roots II,” Numer. Math. 11, 
126-28. 

G.H. Golub (1974). “Some Uses of the Lanczos Algorithm in Numerical Linear Algebra,” 
in Topics in Numerical Analysis, ed., J.J.H. Miller, Academic Press, New York. 


We motivated our discussion of the Lanczos algorithm by discussing the inevitability of 
fill-in when Householder or Givens transformations are used to tridiagonalize. Actually, 
fill-in can sometimes be kept to an acceptable level if care is exercised. See 


LS. Duff (1974). “Pivot Selection and Row Ordering in Givens Reduction on Sparse 
Matrices,” Computing 13, 239-48. 

LS. Duff and J.K. Reid (1976). *A Comparison of Some Methods for the Solution of 
Sparse Over-Determined Systems of Linear Equations," J. Inst. Maths. Applic. 17, 
267-80. 

L. Kaufman (1979). “Application of Dense Householder Transformations to a Sparse 
Matrix,” ACM Trans. Math. Soft. 5, 442-50. 


9.2 Practical Lanczos Procedures 


Rounding errors greatly affect the behavior of the Lanczos iteration. The 
basic difficulty is caused by loss of orthogonality among the Lanczos vectors, 
a phenomenon that muddies the issue of termination and complicates the 
relationship between A's eigenvalues and those of the tridiagonal matrices 
Tx. This troublesome feature, coupled with the advent of Householder's 
perfectly stable method of tridiagonalization, explains why the Lanczos 
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algorithm was disregarded by numerical analysts during the 1950’s and 
1960’s. However, interest in the method was rejuvenated with the devel- 
opment of the Kaniel-Paige theory and because the pressure to solve large, 
sparse eigenproblems increased with increased computer power. With many 
fewer than n iterations typically required to get good approximate extremal 
eigenvalues, the Lanczos method became attractive as a sparse matrix tech- 
nique rather than as a competitor of the Householder approach. 

Successful implementations of the Lanczos iteration involve much more 
than a simple encoding of (9.1.3). In this section we outline some of the 
practical ideas that have been proposed to make Lanczos procedure viable 
in practice. 


9.2.1] Exact Arithmetic Implementation 


With careful overwriting in (9.1.3) and exploitation of the formula 
ak = qi (Aqk — Bk-1dk-1); 


the whole Lanczos process can be implemented with just two n-vectors of 
storage. 


Algorithm 9.2.1. (The Lanczos Algorithm) Given a symmetric 
A € IR?** and w € IR" having unit 2-norm, the following algorithm com- 
putes a k-by-k symmetric tridiagonal matrix Tẹ with the property that 
A(Tk) C A(A). It assumes the existence of a function A.mult(w) that 
returns the matrix-vector product Aw. The diagonal and subdiagonal ele- 
ments of Tẹ are stored in a(1:k) and 6(1:k — 1) respectively. 


u(1:n) 20; Bg = 1; k =0 


while 6; 4 0 
ifk 40 
for i = l:n 
t= wi; wi = Ui/ Be; vi = -pkt 
end 
end 
v =v + A.mult(w) 
k=k+ 1; a, = wTv; v = v - aw; By = ||v |l 
end 


Note that A is not altered during the entire process. Only a procedure 
A.mult(-) for computing matrix-vector products involving A need be sup- 
plied. If A has an average of about i nonzeros per row, then approximately 
(2i + 8)n flops are involved in a single Lanczos step. 

Upon termination the eigenvalues of Tk can be found using the symmet- 
ric tridiagonal QR algorithm or any of the special methods of 88.5, such as 
bisection. 
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The Lanczos vectors are generated in the n-vector w. If they are desired 
for iater use, then special arrangements must be made for their storage. In 
the typical sparse matrix setting they could be stored on a disk or some 
other secondary storage device until required. 


9.2.2  Roundoff Properties 


The development of a practical, easy-to-use Lanczos procedure requires 
an appreciation of the fundamental error analyses of Paige (1971, 1976, 
1980). An examination of his results is the best way to motivate the several 
modified Lanczos procedures of this section. 

After j steps of the algorithm we obtain the matrix of computed Lanczos 


vectors Qk = [41,..-,4% ] and the associated tridiagonal matrix 
a, fy e. 0 
Bi de : 
Tk = 
: o6 Bk-a 
0 € Bk-1 Âk 


Paige (1971, 1976) shows that if £f is the computed analog of rz, then 


AQy = QT, + Peek + Ex (9.2.1) 
where 
| Ex |l2 =~ ull Alle. (9.2.2) 


This indicates that the important equation AQ, = QkTk + TkÉL is satisfied 
to working precision. 

Unfortunately, the picture is much less rosy with respect to the orthog- 
onality among the 4; . (Normality is not an issue. The computed Lanczos 
vectors essentially have unit length.) If Âp = fi(|| fy l2) and we compute 
Gea. = — flf. /D.), then a simple analysis shows that Bade © fk + Wk 
where || wx ||2 = ull fx || ~ ull A ||2. Thus, we may conclude that 


FE 4; + ull A llo 
(5| 
for i = 1:k. In other words, significant departures from orthogonality can 


be expected when fk is small, even in the ideal situation where 77 Op is 


zero. A small Be implies cancellation in the computation of f}. We stress 
that loss of orthogonality is due to this cancellation and is not the result of 


aT a 
là 16i] ad 
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the gradual accumulation of roundoff error. 


Example 9.2.1 The matrix 


_ [ 264 —.48 
A= | —48 2.36 


has eigenvalues À1 = 3 and A2 = 2. If the Lanczos algorithm is applied to this matrix 
with gı = [.810, —.586]T and three-digit floating point arithmetic is performed, then 
G2 = [.707, .707]T. Loss of orthogonality occurs because span(qi) is almost invariant 
for A. (The vector z = [.8, —.6]7 is the eigenvector affiliated with A1.) 


Further details of the Paige analysis are given shortly. Suffice it to 
say now that loss of orthogonality always occurs in practice and with it, 
an apparent deterioration in the quality of T's eigenvalues. This can be 
quantified by combining (9.2. D with Theorem 8.1.16. In particular, if in 
that theorem we set F| = Fer TEX) Ok, S = Ty, and assume that 


T=] QI Qi — Ie llo 


satisfies 7 < 1, then there exist eigenvalues 11, ..., uk € A(A) such that 


lui — Ai(Te)| S. V2 (I Fe lla + I Be lle + 7(2 + 7)Il A lla) 


for i = 1:k. An obvious way to control the 7 factor is to orthogonalize 
each newly computed Lanczos vector against its predecessors. This leads 
directly to our first “practical” Lanczos procedure. 


9.2.8 Lanczos with Complete Reorthogonalization 


Let ro,...,r&—1 € IR” be given and suppose that Householder matrices 
Ho,..., Hy; have been computed such that (Ho --- Hy 31)? [ro,...,rk-1] 
is upper triangular. Let [gi,...,g&] denote the first k columns of the 
Householder product (Ho --- Hy 1). Now suppose that we are given a vec- 
tor ry € IR" and wish to compute a unit vector qx, in the direction of 


k 
w = rk — Y (ai rx)a € span{qi,.--,qe}*. 
i=l 
If a Householder matrix Hy is determined so (Ho --- Hx)? [ro,---, 7k] is 


upper triangular, then it follows that column (k + 1) of Ho--- Hy is the 
desired unit vector. 

If we incorporate these Householder computations into the Lanczos pro- 
cess, then we can produce Lanczos vectors that are orthogonal to machine 
precision: 
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ro = qı (given unit vector) 
Determine Householder Ho so Horo = e1. 


& = q Aq 
fork = l:n — 1 
rk = (A— okI)qk — Bk-ıgk-ı (Bogo = 9) (9.2.3) 
= (Hp-1 -++ Ho)rk 
Determine Householder Hj so Hyw = (w1,... , Wk, Bk, 0,- -, 0) 
Qe+1 = Ho--- Hgek41; &k41 = G1 AG 
end 


This is an example of a complete reorthorgonalization Lanczos scheme. A 
thorough analysis may be found in Paige (1970). The idea of using House- 
holder matrices to enforce orthogonality appears in Golub, Underwood, and 
Wilkinson (1972). 

That the computed d; in (9.2.3) are orthogonal to working precision 
follows from the roundoff properties of Householder matrices. Note that by 
virtue of the definition of q,4, , it makes no difference if 6, = 0. For this 
reason, the algorithm may safely run until k = n — 1. (However, in practice 
one would terminate for a much smaller value of k.) 

Of course, in any implementation of (9.2.3), one stores the Householder 
vectors vy and never explicitly forms the corresponding PX. Since we have 
Hy(1:k,l:k) = J, there is no need to compute the first k components of 

= (Hy-1-:: Ho)ry, for in exact arithmetic these components would be 
Zero. 

Unfortunately, these economies make but a small dent in the computa- 
tional overhead associated with complete reorthogonalization. The House- 
holder calculations increase the work in the kth Lanczos step by O(kn) 
flops. Moreover, to compute q,41, the Householder vectors associated with 
Ho,..., Hy must be accessed. For large n and k, this usually implies a 
prohibitive amount of data transfer. 

Thus, there is a high price associated with complete reorthogonalization. 
Fortunately, there are more effective courses of action to take, but these 
demand that we look more closely at how orthogonality is lost. 


9.2.4 Selective Orthogonalization 


A remarkable, ironic consequence of the Paige (1971) error analysis is that 
loss of orthogonality goes hand in hand with convergence of a Ritz pair. 
To be precise, suppose the symmetric QR algorithm is applied to T, and 
renders computed Ritz values ó,.. ,O and a nearly orthogonal matrix of 
eigenvectors $, = (354). If Y, = [ty Uk) = fUQkSk), then it can be 
shown that for i = 1:k we have 


ull A [i2 


EPA 9.2.4 
(Bel [Sua (9.2.4) 


CAA ed 
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and 
| Ade — 0:9: ||2 = [Bx] [Seal - (9.2.5) 


That is, the most recently computed Lanczos vector @,41 tends to have a 
nontrivial and unwanted component in the direction of any converged Ritz 
vector. Consequently, instead of orthogonalizing 4,41 against all of the 
previously computed Lanczos vectors, we can achieve the same effect by 
orthogonalizing it against the much smaller set of converged Ritz vectors. 

The practical aspects of enforcing orthogonality in this way are dis- 
cussed in Parlett and Scott (1979). In their scheme, known as selective 
orthogonalization, a computed Ritz pair (6, ĝ) is called “good” if it satisfies 


|| A9 — 69 lla ~ vul Alle. 


As soon as g%4) is computed, it is orthogonalized against each good Ritz 
vector. This is much less costly than complete reorthogonalization, since 
there are usually many fewer good Ritz vectors than Lanczos vectors. 

One way to implement selective orthogonalization is to diagonalize Ty at 
each step and then examine the §;; in light of (9.2.4) and (9.2.5). A much 
more efficient approach is to estimate the loss-of-orthogonality measure 
|| Ik — QT Qk ||; using the following result: 


Lemma 9.2.1 Suppose S, = [S d] where S € IR"** andd € I^. If S 
satisfies || I, — STS ||; < u and |l — dTd| < 6 then || I1— ST S4 || < 


u+ where 
1 
He = 5 (oes y(u- 8)? 4| STa [13 ) 
Proof. See Kahan and Parlett (1974) or Parlett and Scott (1979). 0 


Thus, if we have a bound for || 7, — QTQ, lla we can generate a bound for 
Il Dci — QE 1Qkaa ll2 by applying the lemma with S = Qy and d = 4,41. 
(In this case 6 zz u and we assume that 9,41 has been orthogonalized against 
the set of currently good Ritz vectors.) It is possible to estimate the norm 
of QT 4G Qk41 from a simple recurrence that spares one the need for accessing 
(1, ..., Gk. See Kahan and Parlett (1974) or Parlett and Scott (1979). The 
overhead is minimal, and when the bounds signal loss of orthogonality, it is 
time to contemplate the enlargement of the set of good Ritz vectors. Then 
and only then is Tẹ diagonalized. 


9.2.5 The Ghost Eigenvalue Problem 


Considerable effort has been spent in trying to develop a workable Lanc- 
Zos procedure that does not involve any kind of orthogonality enforcement. 
Research in this direction focuses on the problem of “ghost” or “spurious” 
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eigenvalues. These are multiple eigenvalues of 7j that correspond to sim- 
ple eigenvalues of A. They arise because the iteration essentially restarts 
itself when orthogonality to a converged Ritz vector is lost. (By way of 
analogy, consider what would happen during orthogonal iteration §8.2.8 if 
we “forgot” to orthogonalize.) 

The problem of identifying ghost eigenvalues and coping with their pres- 
ence is discussed in Cullum and Willoughby (1979) and Parlett and Reid 
(1981). It is a particularly pressing problem in those applications where all 
of A’s eigenvalues are desired, for then the above orthogonalization proce- 
dures are too expensive to implement. 

Difficulties with the Lanczos iteration can be expected even if A has a 
genuinely multiple eigenvalue. This follows because the TX are unreduced, 
and unreduced tridiagonal matrices cannot have multiple eigenvalues. Our 
next practical Lanczos procedure attempts to circumvent this difficulty. 


9.2.6 Block Lanczos 


Just as the simple power method has a block analog in simultaneous itera- 
tion, so does the Lanczos algorithm have a block version. Suppose n = rp 
and consider the decomposition 


Mi BI e 0 
B M : 
QT AQ = T = "4 t te (9.2.6) 
: to Bra 
0 -- B., M, 


where 
Q = [X1,..., X] X; € R? 


is orthogonal, each M; € IRP*?, and each B; € IRP*? is upper triangular. 
Comparing blocks in AQ — QT shows that 


AX, = X4 4GBL,- XkMy Xy41By— XoBo 50 
for k = Lr — 1. From the orthogonality of Q we have 
My = XLAX, 
for k = 1:r. Moreover, if we define 
Ry = AX,— XMy — Xy-1BL , € R? 


then X,41By = Rx is a QR factorization of Ry. These observations suggest 
that the block tridiagonal matrix T' in (9.2.6) can be generated as follows: 
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X; € IR? given with XT X; = Ip. 

Mı = XT AX 

for k = 1:r—1 (9.2.7) 
Ry = AX, — Xi My - Xx ABL, (XoBd = 0) 
Xk+1Bk = Rk (QR factorization of Ry) 
Mya = XpyAXe+1 


end 


At the beginning of the kth pass through the loop we have 


A[Xi,..., X«] = [X1,---, Xe] Te + Re[0,-.-,0, Ip] (9.2.8) 
where 
M, BT e 0 
Bi M» 
i, = 
: E B; 
0 e Br- M, 


Using an argument similar to the one used in the proof of Theorem 9.1.1, 
we can show that the X, are mutually orthogonal provided none of the Rx 
are rank-deficient. However if rank(R,) < p for some k, then it is possible 
to choose the columns of X,4 such that X7,,X; = 0, for i = 1:k. See 
Golub and Underwood (1977). 

Because T, has bandwidth p, it can be efficiently reduced to tridiago- 
nal form using an algorithm of Schwartz (1968). Once tridiagonal form is 
achieved, the Ritz values can be obtained via the symmetric QR algorithm. 

In order to intelligently decide when to use block Lanczos, it is necessary 
to understand how the block dimension affects convergence of the Ritz 
values. The following generalization of Theorem 9.1.3 sheds light on this 
issue. 


Theorem 9.2.2 Let A by an n-by-n symmetric matriz with eigenvalues 
Ay > +--+ > An and corresponding orthonormal eigenvectors zj,...,z4. Let 
pi È +-+: > up be the p largest eigenvalues of the matriz Ty obtained after 
k steps of the block Lanczos iteration (9.2.7). If Z, = [z1,..-,%] and 
cos(0,) = op(Z7 X1) > 0, then for i = 1:p, X hi > AG — where 


2 _ (A, — 4) tan?(6,) Z Ai Api 


8 =O TS 
t n 


[oI 


IV 
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and ck-ı(z) is the Chebyshev polynomial of degree k — 1. 
Proof. See Underwood (1975). o 


Analogous inequalities can be obtained for T;,’s smallest eigenvalues by 
applying the theorem with A replaced by —A. 

Based on Theorem 9.2.2 and scrutiny of the block Lanczos iteration 
(9.2.7) we may conclude that: 


e the error bound for the Ritz values improve with increased p. 


e the amount of work required to compute T;’s eigenvalues is propor- 
tional to p?. 


e the block dimension should be at least as large as the largest multi- 
plicity of any sought-after eigenvalue. 


How to determine block dimension in the face of these tradeoffs is discussed 
in detail by Scott (1979). 

Loss of orthogonality also plagues the block Lanczos algorithm. How- 
ever, all of the orthogonality enforcement schemes described above can be 
extended to the block setting. 


9.2.7  s-Step Lanczos 


The block Lanczos algorithm (9.2.7) can be used in an iterative fashion 
to calculate selected eigenalues of A. To fix ideas, suppose we wish to 
calculate the p largest eigenvalues. If X, € IR?*? is a given matrix having 
orthonormal columns, we may proceed as follows: 


until || AX; — X17, ||] is small enough 
Generate X2,..., X, € IR"”? via the block Lanczos algorithm. 


Form T; = [ Xi,..., Xa 7 A[ Xy,..., Xs ], an sp-by-sp, 
p-diagonal matrix. 


Compute an orthogonal matrix U — [ 1... , Usp ] such that 
UTT,U = diag(01,...,0,,) with 0 > --- > bap- 


Set X= [X1,-.-,Xs][ui,---, Up ]- 
end 


This is the block analog of the s-step Lanczos algorithm , which has been 
extensively analyzed by Cullum and Donath (1974) and Underwood (1975). 

The same idea can also be used to compute several of A’s smallest eigen- 
values or a mixture of both large and small eigenvalues. See Cullum (1978). 
The choice of the parameters s and p depends upon storage constraints as 
well as upon the factors we mentioned above in our discussion of block 
dimension. The block dimension p may be diminished as the good Ritz 
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vectors emerge. However this demands that orthogonality to the converged 
vectors be enforced. See Cullum and Donath (1974). 


Problems 


P9.2.1 Prove Lemma 9.2.1. 


P9.2.2 If rank(R&) < p in (9.2.7), does it follow that range([ X1,..., Xx ]) contains an 
eigenvector of A? 
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Other practical details associated with the implementation of the Lanczos procedure are 
discussed in 


D.S. Scott (1979). “How to Make the Lanczos Algorithm Converge Slowly,” Math. 
Comp. 33, 239-47. 
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J. Cullum and W.E. Donath (1974). “A Block Lanczos Algorithm for Computing the q 
Algebraically Largest Eigenvalues and a Corresponding Eigenspace of Large Sparse 
Real Symmetric Matrices,” Proc. of the 1974 IEEE Conf. on Decision and Control, 
Phoenix, Arizona, pp. 505-9. 

R. Underwood (1975). “An Iterative Block Lanczos Method for the Solution of Large 
Sparse Symmetric Eigenproblems,” Report STAN-CS-75-495, Department of Com- 
puter Science, Stanford University, Stanford, California. 

G.H. Golub and R. Underwood (1977). “The Block Lanczos Method for Computing 
Eigenvalues,” in Mathematical Software III , ed. J. Rice, Academic Press, New York, 
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papers deal with this issue: 


A.K. Cline, G.H. Golub, and G.W. Platzman (1976). “Calculation of Normal Modes of 
Oceans Using a Lanczos Method,” in Sparse Matriz Computations , ed. J.R. Bunch 
and D.J. Rose, Academic Press, New York, pp. 409-26. 
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9.3 Applications to Ar = b and Least Squares 


In this section we briefly show how the Lanczos iteration can be embellished 
to solve large sparse linear equation and least squares problems. For further 
details, we recommend Saunders (1995). 


9.3.1 Symmetric Positive Definite Systems 


Suppose A € IR*** is symmetric and positive definite and consider the func- 
tional (x) defined by 


olz) = jr As — gzTb 


where b € IR". Since V¢(x) = Az —b, it follows that x = A^ !bis the unique 
minimizer of $. Hence, an approximate minimizer of ¢ can be regarded as 
an approximate solution to Az = b. 

Suppose rg € IR" is an initial guess. One way to produce a vector se- 
quence {x,} that converges to z is to generate a sequence of orthonormal 
vectors (gx) and to let ry minimize ¢ over the set 


Zo + span{q,...,g@e} = (zo aiqd1 + + Ong, 2a, ER} 


for k = 1:n. If Qk = [41,.--, qx], then this just means choosing y € IR* 
such that 


(zo + Qxy)? A(zo + Qey) ~ (zo + Qey)7b 


1 
- 3! (Qk AQr)y — y QE (b — Axo) + é(xo) 


$(ro + Qxy) 


i 


is minimized. By looking at the gradient of this expression with respect to 
y we see that 


Lk = To + QkWk (9.3.1) 


where 
(QLAQx)u = QE (b-— Azo). (9.3.2) 


When k = n the minimization is over all of IR” and so Az, = b. 
For large sparse A it is necessary to overcome two hurdles in order to 
make this an effective solution process: 


e the linear system (9.3.2) must be “easily” solved. 
e we must be able to compute x, without having to refer to q1,..., qk 


explicitly as (9.3.1) suggests. Otherwise there would be an excessive 
amount of data movement. 
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We show that both of these requirements are met if the gy are Lanczos 
vectors. 
After k steps of the Lanczos algorithm we obtain the factorization 


AQk = QxTk + rrek (9.3.3) 
where 
ay fh aes 0 
Bi a ` : 
Tk = QT AQ: = "e e. uM . (9.3.4) 
: wo 60. Bk-1 
0 e Bk-1 Qk 


With this approach (9.3.2) becomes a symmetric positive definite tridiag- 
onal system which may be solved via the LDL? factorization. (See Algo- 
rithm 4.3.6.) In particular, by setting 


1 0 0 0 d 0 > 0 
Ia 1 0 0 : 
ly = ad D,-|° ® 
: 0 
0 Bk-i 1 0 0 d, 


we find by comparing entries in 


Tk = Lk DLT (9.3.5) 
that 
dj =; 
for i = 2:k 
Bi-1 = Bi-1/di-i 
di = oi — Bi-ipi-1 
end 


Note that we need only calculate the quantities 


Hk-1 = Bk-i/dk-a 


dy = Ok — fk iki (9.3.6) 


in order to obtain Ly and D, from Lj, and Dy. ,. 
As we mentioned, it is critical to be able to compute zp in (9.3.1) effi- 
ciently. To this end we define C, € IR"** and p, c IR* by the equations 


CLE = Qk 
LyDypy = Qh (b- Azo) (9.3.7) 
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and observe that if ro = b — Axo then 


te = Dot QeT, Qiro = zo Qe(LeDe LP) |Qiro = Lo + CkPpr- 


Let Cj = [¢1,..-,¢x ] be a column partitioning. It follows from (9.3.7) that 


[e1, 4161  €2; +++, Hk—1Ck—1 + Ck | 


li 


[digk] 


and therefore C, = [Ck—1, Cp] where 


Ck = qk — UMk-1Ck-1- 


Also observe that if we set px = [ p1,- <- , pk 


in Li Dypx = QTro, then 
that equation becomes 


pı qj To 
pa q2 To 
T 
Pk—1 dk -1TO 
Pk Qi To 


Since Ly 4 Dk pk = QT iro it follows that 
_ | Pk-1 
Pk = | pk | 


pk = (qi ro — pid pia) /de 


where 


and thus, 
Le = To +Ckpk = To + Ck aiPk—i + Pek = Xkoi + kCk. 


This is precisely the kind of recursive formula for r+; that we need. To- 
gether with (9.3.6) and (9.3.7) it enables us to make the transition from 
(qk—1, Ck—1, Xk—1) to (qx, cx, zx) With a minimal work and storage. 

A further simplification results if we set qi to be a unit vector in the 
direction of the initial residual ry = b— Azo. With this choice for a Lanczos 
starting vector, qz ro = 0 for k > 2. It follows from (9.3.3) that 


b— Ax, = b-— Á(xo c Qxyk) = ro — (QkTx + rrek )yr 


T T, 
= Tro- Q&Qiro— Tkek V = —TreLy- 
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Thus, if Bk = || ry ||2 = 0 in the Lanczos iteration, then Az, = b. Moreover, 
| Azk —5|l9 = Bele, yx| and so estimates of the current residual can be 
obtained as a by-product of the iteration. Overall, we have the following 
procedure. 


Algorithm 9.3.1 If A € IR"*" is symmetric positive definite, b € IR", and 
zo € IR" is an initial guess (Arp ~ b), then this algorithm computes the 
solution to Az = b. 


TQ — b — Azo 
Bo = || ro ll 
go = 0 
k=0 
while 5, #0 
Qk41 = Tk / Bk 
k=k+1 
Ok = Gf Adk 
rk = (A — okI)qk — Be—-19k-1 
Bk = || rx lle 
ifk=1 
dj =a 
c1 =q 
pı = Bo/o 
zı = Pig 
else 


Hk-1 = fk-1/dk-1 
dk = oy — Pk-1Hk-1 
Ck = Qk — Hk~-1Ck-1 
Dk = —Hk-1dk-1Pk-1/đ4k 
Tk = Xk-1 + Peek 
end 
end 
T = Tk 


This algorithm requires one matrix-vector multiplication and a couple of 
saxpy operations per iteration. The numerical behavior of Algorithm 9.3.1 
is discussed in the next chapter, where it is rederived and identified as the 
widely known method of conjugate gradients. 


9.3.2 Symmetric Indefinite Systems 


A key feature in the above development is the idea of computing the LDLT 
factorization of the tridiagonal matrices Tk. Unfortunately, this is poten- 
tially unstable if A, and consequently TX, is not positive definite. A way 
around this difficulty proposed by Paige and Saunders (1975) is to develop 
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the recursion for ry via an “LQ” factorization of Tj. In particular, at the 


kth step of the iteration, we have Givens rotations J;,...,J,-1 such that 
d 0 O e e- e. 0 
€1 dz 0 eee eee m 0 
A €2 da "P D "m 0 
Tei +++ Jk 1 = Lk = . . 
0 0 O ++ fez exi dk 


Note that with this factorization ry is given by 
Te = To +Qkyk = QT, ‘QE = Wise 


where 
Wy = QA Jg E RPX* 


and s, € IR* solves 
Lesk = Qi. 


Scrutiny of these equations enables one to develop a formula for computing 
xz, from zy, and an easily computed multiple of wy, the last column of 
W,. This defines the SYMMLQ method set forth in Paige and Saunders 
(1975). 

A different idea is to notice from (9.3.3) and the definition Bkqk+1 = rk 
that 


AQx = QkTk + Bkdkkiez = Qkai Hk 


where 


Tk 
Hy = . 
: | brek | 
This (k + 1)-by-k matrix is upper Hessenberg and figures in the MINRES 


method of Paige and Saunders (1975). In this technique xy minimizes 
|| Ax — b ||, over the set zo + spaníqi,...,qk)- Note that 


| A(zo Qky) — ll; = |] AQxy — (b — Azo) |l; 
= || Qe+i Hey — (b — Azo) ll = || Hey — Boe: lla 


where it is assumed that qq = (b—Azo)/{ is a unit vector. As in SYMMLQ, 
it is possible to develop recursions that permit the efficient computation of 
Tk from its predecessor rj ;. The QR factorization of Hy is involved. 

The behavior of the conjugate gradient method is detailed in the next 
chapter. The convergence of SYMMLQ and MINRES is more complicated 
and is discussed in Paige, Parlett, and Van Der Vorst (1995). 
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9.3.3 Bidiagonalization and the SVD 
Suppose UT AV = B represents the bidiagonalization of A € R™*” with 


U = [uy...,u«] UTU =Im 
V = [v.n] VTV = Ín 
and 
ay f me 0 
0 a2 : 
: n M Bn-1 
0 D 0 Qn 


Recall from §5.4.3 that this factorization may be computed using House- 
holder transformations and that it serves as a front end for the SVD algo- 
rithm. 

Unfortunately, if A is large and sparse, then we can expect large, dense 
submatrices to arise during the Householder bidiagonalization. Conse- 
quently, it would be nice to develop a means for computing B directly 
without any orthogonal updates of the matrix A. 

Proceeding just as we did in §9.1.2 we compare columns in the equations 
AV = UB and ATU = VBT for k = 1:n and obtain 


Ave = uk + Ük-1uk— Bouo =0 
T De (9.3.9) 
Atuk = Ove + DkUk4i1 Bnn+i = 0 
Defining 
Tk = Ave — Be—-iuk-1 
Dk = AT uy — ovk 


we may conclude from orthonormality that a, = +|| rx ||a, uk = rk/ox, 
Bk = || pe llo, and veg: = px/Px. Properly sequenced, these equations 
define the Lanczos method for bidiagonalizing a rectangular matrix: 


vı = given unit 2-norm n-vector 
Po = 11; bo = 1; k = 0; up —0 


while 6, #0 
Uk41 = Pk / Bk 
k=k+1 
Tk = Ave — Be—1Uk-1 (9.3.10) 
ak = || rk ll 
Uk = Tk/Ok 
px = AT uy — agur 
Bk = || px lle 


end 
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If rank(A) = n, then we can guarantee that no zero a, arise. Indeed, if 
a, = 0 then span(Avi,..., Avk] C span(ui,..., uk-1) which implies rank 
deficiency. 

If By = 0, then it is not hard to verify that 


A[vi...,v«] = [un..., uk] Bx 


AT [u,,...,u] = [vn..., vw] BE 


where B, = B(1:k,1:k) and B is prescribed by (9.3.8). Thus, the v vectors 
and the u vectors are singular vectors and o(B,) C o(A). Lanczos bidiag- 
onalization is discussed in Paige (1974). See also Cullum and Willoughby 
(1985a, 1985b). It is essentially equivalent to applying the Lanczos tridiag- 
onalization scheme to the symmetric matrix 


0 A 


We showed that Aj((C) = o(d) = —Antm—iti(C) for i = 1:n at the 
beginning of §8.6. Because of this, it is not surprising that the large singular 
values of the bidiagonal matrix tend to be very good approximations to the 
large singular values of A. The small singular values of A correspond to the 
interior eigenvalues of C and are not so well approximated. The equivalent 
of the Kaniel-Paige theory for the Lanczos bidiagonalization may be found 
in Luk (1978) as well as in Golub, Luk, and Overton (1981). The analytic, 
algorithmic, and numerical developments of the previous two sections all 
carry over naturally to the bidiagonalization. 


9.3.4 Least Squares 


The full-rank LS problem min || Az — b ||2 can be solved via the bidiago- 
nalization. In particular, 


n 
tps = Vyrs = 9 wu 
ici 


where y = [yi,..., Yn]? solves the system By = [uTb,...,uIb]!. Note 
that because B is upper bidiagonal, we cannot solve for y until the bidi- 
agonalization is complete. Moreover, we are required to save the vectors 
Uj,..., Un, an unhappy circumstance if n is large. 

The development of a sparse least squares algorithm based on the bidi- 
agonalization can be accomplished more favorably if A is reduced to lower 
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bidiagonal form 


ay 0 0 
fi az 
UTAV =B-= | : "o 0 
0 On 
0 0 Bn 
where V = [1,...,Un] and U = [ui,..., um ] are orthogonal. Comparing 


columns in the equations ATU = V BT and AV = UB we obtain 


Alu, = pk-1Uk-1 + On, Bovo = 0 
Av, = pup + Bees 


It is straightforward to develop a Lanczos procedure from these equations 
and the resulting algorithm is very similar to (9.3.10), only u; is the starting 
vector. 

Define the matrices V = [vi,..., vx], Uk = [ui,-.., uk], and By = 
B(1:k+1, 1:k) and observe that AV, = Uy, By. Our goal is to compute zy, 
the minimizer of || Az — b || over all vectors of the form z = ro-- V. y, where 
y € IR* and zo € IR” is an initial guess. If uy = (b—Azo)/|| b — Azo ||, then 


A(to + Vey) — b = Ungar Bey — BiUnyie: = Uki (Bey — fie) 


where e, = I,,(:,1). It follows that if yy solves the (k + 1)-by-k lower 
bidiagonal LS problem 


min || Besiy — Bie: lla 


then ry = zo + Vkyk. Since B, is lower bidiagonal, it is easy to compute 
Givens rotations J;,..., Jy such that 


R k 
Jp Be = | i | 1 
is upper bidiagonal. If 
d, k 
Jk AU b= | | 1 , 


then it follows that zy = £o Viyk = Wid where Wk = V,RL!. Paige 
and Saunders (1982a) show how zr, can be obtained from z,..; via a simple 
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recursion that involves the last column of Wp. The net result is a sparse LS 
algorithm referred to as LSQR that requires only a few n-vectors of storage 
to implement. 


Problems 


P9.3.1 Modify Algorithm 9.3.1 so that it implements the indefinite symmetric solver 
outlined in §9.3.2. 


P9.3.2 How many vector workspaces are required to implement efficiently (9.3.10)? 


P9.3.3 Suppose A is rank deficient and a, = 0 in (9.3.10). How could ux be obtained 
so that the iteration could continue? 


P9.3.4 Work out the lower bidiagonal version of (9.3.10) and detail the least square 
solver sketched in §9.3.4. 


Notes and References for Sec. 9.3 


Much of the material in this section has been distilled from the following papers: 


C.C. Paige (1974). “Bidiagonalization of Matrices and Solution of Linear Equations,” 
SIAM J. Num. Anal. 11, 197—209. 

C.C. Paige and M.A. Saunders (1975). “Solution of Sparse Indefinite Systems of Linear 
Equations,” SIAM J. Num. Anal. 12, 617-29. 

C.C. Paige and M.A. Saunders (1982a). “LSQR: An Algorithm for Sparse Linear Equa- 
tions and Sparse Least Squares,” ACM Trans. Math. Soft. 8, 43-71. 

C.C. Paige and M.A. Saunders (1982b). “Algorithm 583 LSQR: Sparse Linear Equations 
and Least Squares Problems,” ACM Trans. Math. Soft. 8, 195—209. 

M.A. Sanders (1995). “Solution of Sparse Rectangular Systems,” BIT 35, 588—604. 


See also Cullum and Willoughby (19855a,1985b) and 


O. Widlund (1978). “A Lanczos Method for a Class of Nonsymmetric Systems of Linear 
Equations,” SIAM J. Numer, Anal. 15, 801-12. 

B.N. Parlett (1980). “A New Look at the Lanczos Algorithm for Solving Symmetric 
Systems of Linear Equations,” Lin. Alg. and Its Applic. 29, 323-46. 

G.H. Golub, F.T. Luk, and M. Overton (1981). “A Block Lanczos Method for Computing 
the Singular Values and Corresponding Singular Vectors of a Matrix,” ACM Trans. 
Math. Soft. 7, 149-69. 

J. Cullum, R.A. Willoughby, and M. Lake (1983). “A Lanczos Algorithm for Computing 
Singular Values and Vectors of Large Matrices,” SIAM J. Sci. and Stat. Comp. 4, 
197-215. 

Y. Saad (1987). “On the Lanczos Method for Solving Symmetric Systems with Several 
Right Hand Sides,” Math. Comp. 48, 651-662. 

M. Berry and G.H. Golub (1991). “Estimating the Largest Singular Values of Large 
Sparse Matrices via Modified Moments,” Numerical Algorithms 1, 353-374. 

C.C. Paige, B.N. Parlett,and H.A. Van Der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces,” Numer. Linear Algebra with Applic. 2, 
115-134. 
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9.4 Arnoldi and Unsymmetric Lanczos 


If A is not symmetric, then the orthogonal tridiagonalization QT AQ =T 
does not exist in general. There are two ways to proceed. The Arnoldi 
approach involves the column-by-column generation of an orthogonal Q 
such that QT AQ = H is the Hessenberg reduction of $7.4. The unsym- 
metric Lanczos approach computes the columns of Q = [qi,.--,@n] and 
P = [pi,...,Pn] so that PT AQ = T is tridiagonal and PTQ = In. Both 
methods are interesting as large sparse unsymmetric eigenvalue solvers and 
both can be adapted for sparse unsymmetric Ax = b solving. (See 810.4.) 


9.4.1 The Basic Arnoldi Iteration 


One way to extend the Lanczos process to unsymmetric matrices is due to 
Arnoldi (1951) and revolves around the Hessenberg reduction Q7 AQ = H. 
In particular, if Q = [q1,..-,;@n] and we compare columns in AQ = QH, 


then 
k+1 


Age = >> hing: 1<k<n-1. 
i=1 


Isolating the last term in the summation gives 


k 
hei kU+1 = Adqk — haa: = Tk 
ici 
where ha = ql Aqy for i = 1:k. It follows that if ry 4 0, then qx+1 is 
specified by 
qk+1 = Tk/ hk ik 


where hk+1,k = || rk ||. These equations define the Arnoldi process and in 
strict analogy to the symmetric Lanczos process (9.1.3) we obtain : 


TO = 41 
hig = 1 
k=0 


while (hk+1,k # 0) 
Qk41 = Tk /ħŘk+1,k 
k=k+1 
Tk = Aq, (9.4.1) 
for i= Lk 
ha — qw 
Tk — Tk — hing: 
end 
hk+1k = Il rk lle 
end 
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We assume that q; is a given unit 2-norm starting vector. The gi are called 
the Arnoldi vectors and they define an orthonormal basis for the Krylov 
subspace K(A, qi, k): 


span(gi,...,qx) = span{q,,Aqi,..., A*71g,). (9.4.2) 


The situation after k steps is summarized by the k-step Arnoldi factoriza- 
tion 


AQk = Qk Hy + rrek (9.4.3) 
where Qk = [dqi,..., qx], ex = Ik(:, k), and 
hi hiz oce AM hik 
hoy ha © 0 hæ 
Hk — | 0 hg : 
O e e hkk-1 hkk 


If rk = 0, then the columns of Qx define an invariant subspace and A(H&) C 
A(A). Otherwise, the focus is on how to extract information about A's 
eigensystem from the Hessenberg matrix Hy and the matrix Qk of Arnoldi 
vectors. 

If y € IR* is a unit 2-norm eigenvector for Hj and Hgy = Ay, then from 
(9.4.3) 

(A — AI)z = (e£ y)rk 
where x = Qky. We call A a Ritz value and x the corresponding Ritz 
vector. The size of |e? y||| rx ||; can be used to obtain error bounds, although 
the relevant perturbation theorems are not as routine to apply as in the 
symmetric case. 

Some numerical properties of the Arnoldi iteration are discussed in 
Wilkinson (1965, pp.382). As with the symmetric Lanczos iteration, loss 
of orthogonality among the q; is an issue. But two other features of (9.4.1) 
must be addressed before a practical Arnoldi eigensolver can be obtained: 


e The Arnoldi vectors q1,...,qx are referenced in step k and the com- 
putation of H,(l:k,k) involves O(kn) flops. Thus, there is a steep 
penalty associated with the generation of long Arnoldi sequences. 


e The eigenvalues of H; do not approximate the eigenvalues of A in the 
style of Kaniel and Paige. This is in contrast to the symmetric case 
where information about A’s extremal eigenvalues emerges quickly . 
With Arnoldi, the early extraction of eigenvalue information depends 
crucially on the choice of q1. 


These realities suggest a framework in which we use Arnoldi with repeated, 
carefully chosen restarts and a controlled iteration maximum. (Recall the 
s-step Lanczos process of §9.2.7.) 
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9.4.2 Arnoldi with Restarting 


Consider running Arnoldi for m steps and then restarting the process with 
a vector q4 chosen from the span of the Arnoldi vectors q1,..., gm. Because 
of the Krylov connection (9.4.2), q} has the form 


q4 = p(A)ar 


for some polynomial of degree m — 1. If Av; = Ayu; for i = 1:n and qı has 
the eigenvector expansion 


qi = Q1U1 t: d ann, 


then 
q+ = a1ip(i)vi +++ + anplÀn Wn. 


Note that K(A, q+, m) is rich in eigenvectors that are emphasized by p(X). 
That is, if p(Awantea) is large compared to p(Aunwantea), then the Krylov 
space K(A, q+, m) will have much better approximations to the eigenvector 
Lwanted than to the eigenvector Tunwanted- (It is possible to couch this 
argument in terms of Schur vectors and invariant subspaces rather than in 
terms of particular eigenvectors.) 

Thus the act of picking a good restart vector q+} from K(A, q1, m) is the 
act of picking a polynomial “filter” that tunes out unwanted portions of the 
spectrum. Various heuristics for doing this have been developed based on 
computed Ritz vectors. See Saad (1980, 1984, 1992). 

We describe a method due to Sorensen (1992) that determines the 
restart vector implicitly using the QR iteration with shifts. The restart 
occurs after every m steps and we assume that m > j where j is the num- 
ber of sought-after eigenvalues. The choice of the Arnoldi length parameter 
m depends on the problem dimension n, the effect of orthogonality loss, and 
system storage constraints. 

After m steps we have the Arnoldi factorization 


AQ. = Q.H. + reet, 


where Qe € IR?*"" has orthonormal columns, He € IR"*™ is upper Hessen- 
berg, and QTr, = 0. The subscript “c” stands for “current.” The QR 
iteration with shifts is then applied to He: 


HO =H, 
for i = 1:p 
HO — mI = VR, 
HOY) = RV, + ul 
end 
H, = H+) 
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Here p = m — j and it is assumed that the implicitly shifted QR process of 
87.5.5 is applied. The selection of the shifts will be discussed shortly. 
The orthogonal matrix V = V; --- V, has three crucial properties: 


(1) H} = VT H,V. This is because Vf HV, = HG*V, 


(2) [V]; = 0 for i = 1: — 1. This is because each V; is upper Hessenberg 
and so V € IR"*"" has lower bandwidth p = m — j. 


(3) The first column of V has the form 
Vei = o(H, — pl) (Ae — uy AD) (He — pater (9.4.4) 
where a is a scalar. 
To be convinced of property (3), consider the p = 2 case: 


VR;R, = W(VjRg3)R, = V (HO) — pol)Ri 
Vi(VIHOV, — u4D)R, = (HU — u9DVV Ri 
(H® — uo) (HO — pil) = (He — uaT(Hz — p I). 


H 


Since R2R, is upper triangular, the first column of V = Vi Vz is a multiple 
of (He — pol)(He — pI). 

We now show how to restart the Arnoldi process using the matrix V to 
implicitly select the new starting vector. From (1) we obtain the following 
transformation of (9.4.3): 


AQ, = Q,H, + rel V 
where Q4, = Q.V. This is not a new length-m Arnoldi factorization because 
eT V is not a multiple of eZ. However, in view of property (2), 


AQ.(,1:3) = Q4, 1:3) Hy (1:9, 1:7) + vm;rce] (9.4.5) 


is a length-7 Arnoldi factorization. By "jumping into" the basic Arnoldi 
iteration at step j --1 and performing p steps, we can extend (9.4.5) to a new 


length-m Arnoldi factorization. Moreover, using property (3) the associated 


starting vector gine”) = Q,(:, 1) has the following characterization: 


Q+(:; 1) = Q.Vei = aQc(H. — pupl):-: (He — p I)e: 
= o(A-pgpj)---(Ae—pil)Qeei — (9.4.6) 
'The last equation follows from the identity 
(A — pI Qc = Q-(He — hI) + rem 


and the fact that eT. f(H.)e, = 0 for any polynomial f(-) of degree p — 1 or 
less. 
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Thus, 9/"*”) = p(A)q: where p(A) is the polynomial 


P(A) = (À = i) — pa) +++ (A= ng). 


This shows that the shifts are the zeros of the filtering polynomial. One 
interesting choice for the shifts is to compute A(H.) and to identify the 
eigenvalues of interest À1,..., Az: 


A(H.) = s. SX; U Dan vey Am}: 


Setting 1; = Nery for i = lp is one way of generating a filter polynomial 
that de-emphasizes the unwanted portion of the spectrum. 

We have just presented the rudiments of the implicitly restarted Arnoldi 
method. It has many attractive attributes. For implementation details and 
further analysis, see Lehoucq and Sorensen (1996) and Morgan (1996). 


9.4.3 Unsymmetric Lanczos Tridiagonalization 


‘Another way to extend the symmetric Lanczos process is to reduce A 
to tridiagonal form using a general similarity transformation. Suppose 
A € IR**" and that a nonsingular matrix Q exists so 


Q1 ^1 ... 0 

Bi a2 : 
QAQ =T = 
: . . Yn-1 
0 oe’ Bn-1 On 
With the column partitionings 
Q = [41 seen ] 
QT = P = [pi P2] 


we find upon comparing columns in AQ= QT and AT P = PTT that 


Aqk = Yk-19k-1 + OkQk + Beaks "Jogo = 0 
AT p, Be—-1Pk—1 + OkDk + YkPk+i Bopo = 0 


for k = 1:n—1. These equations together with the biorthogonality condition 
PTQ = I, imply 


k = pi Aqk 
and 
bkqk+1 = rk = (A—oxI)ak — Ye-19k-1 
YWkPkyi = Sk = (A—agl) px — Bk ipia. 
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There is some flexibility in choosing the scale factors y and yy. Note that 
l= Pk+1đk+1 = (56/7) (rk /Bx) - 
It follows that once fj is specified yz is given by 
Yk = Sg Tk / Êk. 
With the “canonical” choice 8k = || rx ||; we obtain 


qı, pı given unit 2-norm vectors with pan #0. 


k=0 
go = 9; TO =H 
Po = 0; So = pı 
while (ry #0) A (sy #0) ^ (sir, #0) 
Bk = || Te ll2 
^k = SETk/ Dk 
Qk+1 = Tk / Bk 
Pk+1 = Sk/^fk 
k=k+1 (9.4.7) 
OK = PE Ak 


rk = (A — akI)qQk — k-1dk-1 
sk = (A — okI)T pk — Bk apka 


end 
If 
a1 ^ 0 
Bi az 
Ty = , 
: " . Yr-1 
0 € Bk-1 Qk 


then the situation at the bottom of the loop is summarized by the equations 


Alq,---.%] = lais- -sdk | Tk + Teepe (9.4.8) 
AT[py...,pk] = [pu pk] TE. + skeg. (9.4.9) 
If rk = 0, then the iteration terminates and spaníqi,...,qx) is an invari- 


ant subspace for A. If są = 0, then the iteration also terminates and 
span([pi,...,pk) is an invariant subspace for AT. However, if neither of 
these conditions are true and SLTk = 0, then the tridiagonalization process 
ends without any invariant subspace information. This is called serious 


breakdown. See Wilkinson (1965, p.389) for an early discussion of the mat- 
ter. 
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9.4.4 The Look-Ahead Idea 


It is interesting to look the serious breakdown issue in the block version 
of (9.4.7). For clarity assume that A € IR?*^ with n = rp. Consider the 
factorization 


M, CT ee 0 
By M; : 
PTAQ = re (9.4.10) 
: to 7 Cra 
O € B. M. 


where all the blocks are p-by-p. Let Q = [Qi,...,Q. | and P = [Ps,..., P.] 
be conformable partitionings of Q and P. Comparing block columns in the 
equations AQ = QT and ATP = PTT we obtain 


Qk+1ıBk = AQk - Qu My - Qx-1CL , Rk 
PeaiCy = ATP- PMT — PhB; = St 


II 


Note that My = PT AQ,. If SF Ry € IRP*? is nonsingular and we compute 
By, Cy € IRP*? so that 
Cf Bk = ST Rr, 


then 


Qui = RB," (9.4.11) 
Pear = SRC," (9.4.12) 


satisfy PT, Qk+1 = Ip. Serious breakdown in this setting is associated with 
having a singular ST Re. 

One way of solving the serious breakdown problem in (9.4.7) is to go 
after a factorization of the form (9.4.10) in which the block sizes are dynam- 
ically determined. Roughly speaking, in this approach matrices Qx4, and 
Pk+ı are built up column by column with special recursions that culminate 
in the production of a nonsingular Pf, '41Qk+1- The computations are ar- 
ranged so that the biorthogonality conditions PT Qy,, = 0 and QT P4, = 0 
hold for i = 1:k. 

A method of this form belongs to the family of look-ahead Lanczos 
methods. The length of a look-ahead step is the width of the Qk+1 and Pk+1 
that it produces. If that width is one, a conventional block Lanczos step 
may be taken. Length-2 look-ahead steps are discussed in Parlett, Taylor 
and Liu (1985). The notion of incurable breakdown is also presented by these 
authors. Freund, Gutknecht, and Nachtigal (1993) cover the general case 
along with a host of implementation details. Floating point considerations 
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require the handling of “near” serious breakdown. In practice, each My that 
is 2-by-2 or larger corresponds to an instance of near serious breakdown. 


Problems 


P9.4.1 Prove that the Arnoldi vectors in (9.4.1) are mutually orthogonal. 
P9.4.2 Prove (9.4.4). 
P9.4.3 Prove (9.4.6). 


P9.4.4 Give an example of a starting vector for which the unsymmetric Lanczos iteration 
(9.4.7) breaks down without rendering any invariant subspace information. Use 


P9.4.5 Suppose H € R**^ is upper Hessenberg. Discuss the computation of a unit 
upper triangular matrix U such that HU = UT where T is tridiagonal. 


P9.4.6 Show that the QR algorithm for eigenvalues does not preserve tridiagonal struc- 
ture in the unsymmetric case. 
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Chapter 10 


Iterative Methods for 
Linear Systems 


$10.1 The Standard Iterations 

§10.2 The Conjugate Gradient Method 
§10.3 Preconditioned Conjugate Gradients 
§10.4 Other Krylov Subspace Methods 


We concluded the previous chapter by showing how the Lanczos it- 
eration could be used to solve various linear equation and least squares 
problems. The methods developed were suitable for large sparse problems 
because they did not require the factorization of the underlying matrix. In 
this section, we continue the discussion of linear equation solvers that have 
this property. 

The first section is a brisk review of the classical iterations: Jacobi, 
Gauss-Seidel, SOR, Chebyshev semi-iterative, and so on. Our treatment of 
these methods is brief because our principal aim in this chapter is to high- 
light the method of conjugate gradients. In §10.2, we carefully develop this 
important technique in a natural way from the method of steepest descent. 
Recall that the conjugate gradient method has already been introduced via 
the Lanczos iteration in $9.3. The reason for deriving the method again is 
to motivate some of its practical variants, which are the subject of §10.3. 
Extensions to unsymmetric problems are treated in §10.4. 

We warn the reader of an inconsistency in the notation of this chapter 
In §10.1, methods are developed at the “(i, j) level” necessitating the use of 
superscripts: z denotes the i-th component of a vector z‘*). In the other 
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sections, however, algorithmic developments can proceed without explicit 
mention of vector/matrix entries. Hence, in §10.2-§10.4 we dispense with 
superscripts and denote vector sequences by {zx}. 


Before You Begin 


Chapter 1, §§2.1-2.5, and $2.7, Chapter 3, and $84.1-4.3 are assumed. 
Other dependencies include: 


Chapter 9 
| 
§10.1 - $102 — §103 -» $10.4 
T 
87.4 


Texts devoted to iterative solvers include Varga (1962), Young (1971), 
Hageman and Young (1981), and Axelsson (1994). The software “tem- 
plates" volume by Barrett et al (1993) is particularly useful. The direct 
(non-iterative) solution of large sparse systems is sometimes preferred. See 
George and Liu (1981) and Duff, Erisman, and Reid (1986). 


10.1 The Standard Iterations 


The linear equation solvers in Chapters 3 and 4 involve the factorization 
of the coefficient matrix A. Methods of this type are called direct methods. 
Direct methods can be impractical if A is large and sparse, because the 
sought-after factors can be dense. An exception to this occurs when A is 
banded (cf. $4.3). Yet in many band matrix problems even the band itself 
is sparse making algorithms such as band Cholesky difficult to implement. 

One reason for the great interest in sparse linear equation solvers is the 
importance of being able to obtain numerical solutions to partial differ- 
ential equations. Indeed, researchers in computational PDE’s have been 
responsible for many of the sparse matrix techniques that are presently in 
general use. 

Roughly speaking, there are two approaches to the sparse Ar = b prob- 
lem. One is to pick an appropriate direct method and adapt it to exploit 
A’s sparsity. Typical adaptation strategies involve the intelligent use of 
data structures and special pivoting strategies that minimize fill-in. 

In contrast to the direct methods are the iterative methods. These meth- 
ods generate à sequence of approximate solutions (z(9) and essentially 
involve the matrix A only in the context of matrix-vector multiplication. 
The evaluation of an iterative method invariably focuses on how quickly the 
iterates z() converge. In this section, we present some basic iterative meth- 
ods, discuss their practical implementation, and prove a few representative 
theorems concerned with their behavior. 
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10.1.1 The Jacobi and Gauss-Seidel Iterations 


Perhaps the simplest iterative scheme is the Jacobi iteration. It is defined 
for matrices that have nonzero diagonal elements. The method can be 
motivated by rewriting the 3-by-3 system Az = 6 as follows: 


T) = (bi — 01215 — 031323)/a11 
T2 = (bo — az121 — 42323) /a22 
z3 = (bs — a3121 — 232322)/a33 


Suppose z (9) is an approximation to z = A-!b. A natural way to generate 
a new approximation z‘k+1) is to compute 


g( Ft? = (h- ast = as?) /an 
aft) = (b- 23120? -= 22325? )/a22 (10.1.1) 
aft) = (b- a3 zi? — a320$")/a33 


This defines the Jacobi iteration for the case n = 3. For general n we have 


for i = l:n 


i-1 n 
g D = bi — Yaga — > ays /^ (10.1.2) 
j=l j=itl 
end 
Note that in the Jacobi iteration one does not use the most recently avail- 
able information when computing (kt), For example, a) is used in the 
calculation of okt) even though component zift» is known. If we revise 
the Jacobi iteration so that we always use the most current estimate of the 
exact z; then we obtain 


for i = ln 


i-1 n 
z D = bi — Y agr t® — > ayr Je (10.1.3) 
j=l j=i+1 
end 
This defines what is called the Gauss-Seidel iteration. 
For both the Jacobi and Gauss-Seidel iterations, the transition from 


z*) to z(t») can be succinctly described in terms of the matrices L, D, 
and U defined by: 


0 0 0 

021 0 : 

L = a31 032 ` 0 
0 0 


Qn1 an2 ` G@nan-1 0 
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D = dieg(an,...,a44) (10.1.4) 
O ài > + Gin 
0 0 
U = jo o ^. an-2,n 
: Qs 1, 
0 0 -.-. QO 0 


In particular, the Jacobi step has the form Myz(**9 = N;z(9? + b where 
M; = Dand Nz; = -(L-U). On the other hand, Gauss-Seidel is defined 
by Maz (**9 = Nex) + b with Mg = (D + L) and Ng = -U. 


10.1.2  Splittings and Convergence 


The Jacobi and Gauss-Seidel procedures are typical members of a large 
family of iterations that have the form 


Ma(**9 = Nz) +b (10.1.5) 


where A = M-N is a splitting of the matrix A. For the iteration (10.1.5) 
to be practical, it must be “easy” to solve a linear system with M as the 
matrix. Note that for Jacobi and Gauss-Seidel, M is diagonal and lower 
triangular respectively. 

Whether or not (10.1.5) converges to z = A~1b depends upon the eigen- 
values of M-!N. In particular, if the spectral radius of an n-by-n matrix 
G is defined by 

p(G) = max{ |A|: A € A(G) }, 


then it is the size of p(M~!N) is critical to the success of (10.1.5). 


Theorem 10.1.1 Suppose bc R” and A= M -N € IP"*" is nonsingu- 
lar. If M is nonsingular and the spectral radius of M^!N satisfies the 
inequality p(M—!N) < 1, then the iterates x) defined by Māt») = 
Ne) +b converge to x = Aq b for any starting vector 2. 


Proof. Let e(9 = z(9 — z denote the error in the kth iterate. Since Mz 
=Nz + it follows that M (z(**? — x) = N(z(9) — z), and thus, the error in 
x'*+1) is given by et) = M-1Ne(9 = (M-1N)**tle(0, From Lemma 
7.3.2 we know that (M-!N)* — 0 iff p(M-1N) <1.0 


This result is central to the study of iterative methods where algorithmic 
development typically proceeds along the following lines: 


e A splitting A = M — N is proposed where linear systems of the form 
Mz — d are “easy” to solve. 
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e Classes of matrices are identified for which the iteration matrix G — 
M-1N satisfies p(G) < 1. 


e Further results about p(G) are established to gain intuition about 
how the error e*) tends to zero. 


For example, consider the Jacobi iteration, Dr*+)) = —(L- U)z(9 + p, 
One condition that guarantees o(M; ! Nj) < 1 is strict diagonal dominance. 
Indeed, if A has that property (defined in 83.4.10), then 


e(Mj'!Nj) € | D" (L--U) llo = max yj 


ji 


Usually, the *more dominant" the diagonal the more rapid the convergence 
but there are counterexamples. See P10.1.7. 

A more complicated spectral radius argument is needed to show that 
Gauss-Seidel converges for symmetric positive definite A. 


Theorem 10.1.2 If A c IR^*? is symmetric and positive definite, then the 
Gauss-Seidel iteration (10.1.3) converges for any 2, 


Proof. Write A = L+ D + LT where D = diag(a,;) and L is strictly lower 
triangular. In light of Theorem 10.1.1 our task is to show that the matrix 
G = -(D + L)-1LT has eigenvalues that are inside the unit circle. Since 
D is positive definite we have Gy. = D!?GD-V? = -(I-4 L) LT, 
where Lj. = D-!/?[D-7?, Since G and G have the same eigenvalues, 
we must verify that p(G,) < 1. If Giz = Az with zÉz = 1, then we 
have -LTx = A(I + Li)z and thus, ~z" LTr = (14+ z” Liz). Letting 
a+ bi = z” Liz we have 


—a + bi 


l+a+bi 


A? = = —— c. 
Al | 1 + 2a + a? +b? 


However, since D-1/24Dp-V? = I + Lı + LT is positive definite, it is not 
hard to show that 0 < 1-4-zPÉLiz-zHÜLTr = 1+42a implying |A|< 1.0 


This result is frequently applicable because many of the matrices that arise 
from discretized elliptic PDE’s are symmetric positive definite. Numerous 
other results of this flavor appear in the literature. 


10.1.3 Practical Implementation of Gauss-Seidel 


We now focus on some practical details associated with the Gauss-Seidel 
iteration. With overwriting the Gauss-Seidel step (10.1.3) is particularly 
simple to implement: 
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for i = 1:n 


i-1 n 
zti = |b — 1 Ajj; — 1 aijtj Gi 
j=l j=it1 
end 


This computation requires about twice as many flops as there are nonzero 
entries in the matrix A. It makes no sense to be more precise about the 
work involved because the actual implementation depends greatly upon the 
structure of the problem at hand. 

In order to stress this point we consider the application of (10.1.3) to 
the N M-by-N M block tridiagonal system 


T -In Ut 0 gı fi 
-Iy T Er : 92 fa 
. T : = : (10.1.6) 
007. = ly : : 
Q e -Iy T 9M fm 
where 
4 -1 e. 0 G(1,j) F(1, j) 
-1 4 ^n. : G(2,3) F(2, j) 
T= » gj : , fj = 
: 2 n —] : : 
Qs -1 4 G(N, j) F(N, j) 


This problem arises when the Poisson equation is discretized on a rectangle. 
It is easy to show that the matrix A is positive definite. 

With the convention that G(i,j) = 0 whenever i € {0,N +1} or 
j € (0, M +1} we see that with overwriting the Gauss-Seidel step takes on 
the form: 


for j = 1:M 
fori=1:N 
G(,3) = (FG, 5) + G@ - 1,7) + GG - 1,5) 
G(i,j —1) + G(i, j + 1))/A 
end 
end 


Note that in this problem no storage is required for the matrix A. 
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10.1.4 Successive Over-Relaxation 


The Gauss-Seidel iteration is very attractive because of its simplicity. Un- 
fortunately, if the spectral radius of Mg 1 Ng is close to unity, then it may 
be prohibitively slow because the error tends to zero like p(Mg ! Ng). To 
rectify this, let w € IR and consider the following modification of the Gauss- 
Seidel step: 


for i = l:n 


i-1 n 
k k k 
a "lg b; — Mags Ql > ajs Je 
j=l 


j=i+1 


+ (l-w)a (10.1.7) 
end 


This defines the method of successive over-relaxation (SOR). Using (10.1.4) 
we see that in matrix terms, the SOR step is given by 


Mort) = Nyc + wd (10.1.8) 


where M, = D+wL and Ny = (1—w)D-—wWU. For a few structured (but 
important) problems such as (10.1.6), the value of the relaration parameter 
w that minimizes p(M>1N.,) is known. Moreover, a significant reduction 
in (Mj! Nj) = p(Mg' Na) can result. In more complicated problems, 
however, it may be necessary to perform a fairly sophisticated eigenvalue 
analysis in order to determine an appropriate w. 


10.1.5 The Chebyshev Semi-Iterative Method 


Another way to accelerate the convergence of an iterative method makes 
use of Chebyshev polynomials. Suppose 2“),..., 2) have been generated 
via the iteration MrG+)) = Na + b and that we wish to determine 
coefficients v;(k), j = 0:k such that 


k 
y9 = Y ylk)” (10.1.9) 
j=0 


represents an improvement over x), If 2 =... = zf) = g, then it is 
reasonable to insist that y = x. Hence, we require 


ouk) = 1. (10.1.10) 


Subject to this constraint, we consider how to choose the v;(k) so that the 
error in y(& is minimized. 
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Recalling from the proof of Theorem 10.1.1 that 2) —z = (M~1N)Fe() 
where e) = 7 — z, we see that 


k k 
y? -r = S15 (k) (ce -z)- Yu G) (M71 Nye . 
j=0 j=0 


Working in the 2-norm we therefore obtain 


ly -z l2 < I px) lla le Ile (10.1.11) 


where G = M-!N and 
k > 
pelz) = Fyk) . 
j=0 


Note that the condition (10.1.10) implies p,(1) = 1. 
At this point we assume that G is symmetric with eigenvalues A; that 
satisfy -1« a < àn €--- <A) € B « 1. It follows that 


lp«(G)la = mex |pk(à)| S max |pe()|- 
X€X(A) aXA«B 


Thus, to make the norm of p,(G) small, we need a polynomial p(z) that 
is small on [o, 8] subject to the constraint that py (1) = 1. 

Consider the Chebyshev polynomials c;(z) generated by the recursion 
cj(z) = 2zcj 1(z) — cj-2(z) where eo(z) = 1 and ei1(z) = z. These polyno- 
mials satisfy |c;(z)| € 1 on [-1, 1] but grow rapidly off this interval. As a 


consequence, the polynomial 
—1422£—9 
Ck ( t B — 2) 


Pele) = — 


where 
l-a 1-5 
= -1+2 = 142——— 
H + Boa + Boa 


satisfies p,(1) = 1 and tends to be small on [o, 8]. From the definition of 
Px(z) and equation (10.1.11) we see 


— x (0) | 
® gl, < eae le 
tele 5 Tew 


Thus, the larger p is, the greater the acceleration of convergence. 
In order for the above to be a practical acceleration procedure, we need 
a more efficient method for calculating y*) than (10.1.9). We have been 
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tacitly assuming that n is large and thus the retrieval of © , ... , x(&) for 
large k would be inconvenient or even impossible. 

Fortunately, it is possible to derive a three-term recurrence among the 
y*) by exploiting the three-term recurrence among the Chebyshev polyno- 
mials. In particular, it can be shown that if 


2-ß-a ck(u) 
p-a cy) 


Uk] = 2 


then 
yk) = wear (y P — y (5-0 + yz) + yk-)) 


Mz) = b- Ay *) (10.1.12) 


y = 2/(2-a— 8), 


where y© = 2) and y® = z). We refer to this scheme as the Cheby- 
shev semi-iterative method associated with My(*+)) = Ny(9 + b. For the 
acceleration to be effective we need good lower and upper bounds a and f. 
As in SOR, these parameters may be difficult to ascertain except in a few 
structured problems. 

Chebyshev semi-iterative methods are extensively analyzed in Varga 
(1962, chapter 5), as well as in Golub and Varga (1961). 


10.1.6 Symmetric SOR 


In deriving the Chebyshev acceleration we assumed that the iteration ma- 
trix G = M-!N was symmetric. Thus, our simple analysis does not apply 
to the unsymmetric SOR iteration matrix M7 1N.,. However, it is pos- 
sible to symmetrize the SOR method making it amenable to Chebyshev 
acceleration. The idea is to couple SOR with the backward SOR scheme 


for i=n:—1:1 


i-1 
k 
at +) Zw b; — as gft) — Y ajr J^ 
j=l 


j=itl 


+ (1-w)a (10.1.13) 
end 


This iteration is obtained by updating the unknowns in reverse order in 
(10.1.7). Backward SOR can be described in matrix terms using (10.1.4). 
In particular, we have M, z** 0 = Ñ x) + wb where 


M, = Dc oU and Ño = (1—-w)D - uL. (10.1.14) 
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If A is symmetric (U = LT), then M, = MT and Ñ, = NI, and we have 
the iteration 


Mz 5*2 = NO 4 wb 
(10.1.15) 
MIzU*D = nT pH) 4 wb, 
It is clear that G = M; NIM;!N, is the iteration matrix for this 
method. From the definitions of M,, and N,, it follows that 
G = MN = (M,D-! MI) (NID^N,). (10.1.16) 


If D has positive diagonal entries and K KT = (NT D^! N,) is the Cholesky 
factorization, then KTGK-T = K™(M,,D~1M2)-!K. Thus, G is similar 
to a symmetric matrix and has real eigenvalues. 

The iteration (10.1.15) is called the symmetric successive over-relazation 
(SSOR) method. It is frequently used in conjunction with the Chebyshev 
semi-iterative acceleration. 


Problems 


P10.1.1 Show that the Jacobi iteration can be written in the form 2(*t+1) = 2(*) +4 Hr(*) 
where r(*) = b — Az(9. Repeat for the Gauss-Seidel iteration. 


P10.1.2 Show that if A is strictly diagonally dominant, then the Gauss-Seidel iteration 
converges. 


P10.1.3 Show that the Jacobi iteration converges for 2-by-2 symmetric positive definite 
systems. 


P10.1.4 Show that if A = M — N is singular, then we can never have (M !N) « 1 
even if M is nonsingular. 


P10.1.5 Prove (10.1.16). 


P10.1.6 Prove the converse of Theorem 10.1.1. In other words, show that if the iteration 
Ma (F*) ZNaz(9 + b always converges, then (M^! N) « 1. 


P10.1.7 (Supplied by R.S. Varga) Suppose that 
[a1 -1/2 _ 1  -3/4 
A = | -1/2 1 | Ar = | -1/12 1 |. 


Let Jı and Jz be the associated Jacobi iteration matrices. Show that p(Ji) > p(J2) 
thereby refuting the claim that greater diagonal dominance implies more rapid Jacobi 
convergence. 


P10.1.8 The Chebyshev algorithm is defined in terms of parameters 

2cx(1/p) 
pcr41(1/p) 
where c,(A) = cosh[kcosh—1(A)] with A > 1. (a) Show that 1 < wy < 2 fork > 1 
whenever 0 < p < 1. (b) Verify that we41 < wx. (c) Determine lim wọ as k — oo. 
P10.1.9 Consider the 2-by-2 matrix 


Wk+1 = 
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(a) Under what conditions will Gauss-Seidel converge with this matrix? (b) For what 
range of w will the SOR method converge? What is the optimal choice for this parameter? 
(c) Repeat (a) and (b) for the matrix 


T h s 
A= | ~sT I, | 
where S c R?*". Hint: Use the SVD of S. 


P10.1.10 We want to investigate the solution of Au = f where A # AT. For a model 
problem, consider the finite difference approximation to 


—u" +ou' =0 O<2<1 
where u(0) = 10 and u(1) = 10exp?. This leads to the difference equation 
—ui—1 + 2u; — uipi + R(ui41 — u; 1) =0 i=1:n 


where R = oh/2, uo = 10, and tun41 = 10exp%. The number R should be less than 
1. What is the convergence rate for the iteration Mu(*+) = Nul) + f where M = 
(A + AT)/2 and N = (AT — A)/2? 


P10.1.11 Consider the iteration 
yt» = w(By(9) +d- yD) + y C70 


where B has Schur decomposition Q7 BQ = diag(A1,...,An) with à > © > An. 
Assume that z = Bx +d. (a) Derive an equation for e(*) = y(*) — a. (b) Assume 
y(D = By +d. Show that e? = p,(B)e) where p, is an even polynomial if k is 
even and an odd polynomial if k is odd. (c) Write f(*) = QTe(*). Derive a difference 
equation for ff? for j = 1:n. Try to specify the exact solution for general Du and f». 
(d) Show how to determine an optimal w. 
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10.2 The Conjugate Gradient Method 


A difficulty associated with the SOR, Chebyshev semi-iterative, and related 
methods is that they depend upon parameters that are sometimes hard to 
choose properly. For example, the Chebyshev acceleration scheme needs 
good estimates of the largest and smallest eigenvalue of the underlying 
iteration matrix M-!N. Unless this matrix is sufficiently structured, it 
may be analytically impossible and/or computationally expensive to do 
this. 

In this section, we present a method without this difficulty for the sym- 
metric positive definite Ar = b problem, the well-known Hestenes-Stiefel 
conjugate gradient method. We derived this method in $9.3.1 from the 
Lanczos algorithm. The derivation now is from a different point of view 
and it will set the stage for various important generalizations in 810.3 and 
810.4. 


10.2.1 Steepest Descent 


The starting point in the derivation is to consider how we might go about 
minimizing the function 


g(x) = ji Az - zTb 


where b € IR" and A € IR?*" is assumed to be positive definite and sym- 
metric. The minimum value of ¢(z) is —bT A~1b/2, achieved by setting x 
= A^lb. Thus, minimizing ¢ and solving Az = b are equivalent problems 
if A is symmetric positive definite. 

One of the simplest strategies for minimizing ¢ is the method of steepest 
descent. At a current point x, the function ¢ decreases most rapidly in the 
direction of the negative gradient: —Vé(z.) = b — Aze. We call 


Te = b — Ax. 


the residual of xe. If the residual is nonzero, then there exists a positive 
a such that ¢(z. + orc) < $(z.). In the method of steepest descent (with 
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exact line search) we set a = r7r,/r7 Ar, thereby minimizing 


1 
é(z. + are) = (Te) ~ art r, + 52 2rT Are. 
This gives 


zo = initial guess 
To = b — Azo 
k=0 
while r, #0 
k=k+1 (10.2.1) 
Qk = TL _irk-1/rf_1Ark-1 
Tk = Tk—-1 + OkTk-1 
Tk = b — Azer 
end 


It can be shown that 


(o0) 4 gc) < ( — zu) C= + SA) (10.2.2) 


which implies global convergence. Unfortunately, the rate of convergence 
may be prohibitively slow if the condition &2(A) = A1(4)/A4(A) is large. 
Geometrically this means that the level curves of ¢ are very elongated 
hyperellipsoids and minimization corresponds to finding the lowest point 
in a relatively flat, steep-sided valley. In steepest descent, we are forced 
to traverse back and forth across the valley rather than down the valley. 
Stated another way, the gradient directions that arise during the iteration 
are not different enough. 


10.2.2 General Search Directions 


To avoid the pitfalls of steepest descent, we consider the successive min- 
imization of ¢ along a set of directions {p,,pe,...} that do not neces- 
sarily correspond to the residuals (ro,r;,...). It is easy to show that 
$(zk—1 + app) is minimized by setting 


a = ay = pirk-i/ PX Apr. 


With this choice it can be shown that 


T 2 
O(tk-1 + arp) = dci) — 1 PAE 


10.2.3 
2 pi Apr ( ) 


To ensure a reduction in the size of ¢ we insist that p, not be orthogonal 
to rk-1. This leads to the following framework: 
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Zo = initial guess 
To = b — Axo 
k=0 
while rx #0 
k=k+1 (10.2.4) 
Choose a direction py such that pērk- # 0. 
Ok = PETK-1/PK Apk 
Tk = Tk-1 + OkDk 
Tk = b — Ax, 
end 


Note that 


zy € zo +span{pi,..., Pk} = {Lot mpi t c + kp: € R}. 
Our goal is to choose the search directions in a way that guarantees con- 
vergence without the shortcomings of steepest descent. 

10.2.3 A-Conjugate Search Directions 
If the search directions are linearly independent and zy solves the problem 


min (z) (10.2.5) 
x€ao+span{pi,...,Pe} 


for k = 1,2,..., then convergence is guaranteed in at most n steps. This is 
because r, minimizes ¢ over IR” and therefore satisfies Ar, = b. 

However, for this to be a viable approach the search directions must 
have the property that it is “easy” to compute z, given zy. ,. Let us see 
what this says about the determination of px. If 


Tk = To + Pk-iy + Opk 
where Px 1 = [pi,---;Pk—-1], v € RF! anda € TR, then 
2 
a 
(zx) = (to + Pk-iy) + ay? PEL Ape + > Pk APR — ap, ro. 
If pk € span(Api,..., Apk-1}}, then the cross term ay’ PZ, Ap, is zero 


and the search for the minimizing x, splits into a pair of uncoupled mini- 
mizations, one for y and one for a: 


min (zx) min ó(zo + Pk-1y + apr) 
Z& €zo--span(pi ,...,px) ya 


. a? 
= min (om + Pay) + > Pk Apk — ostro) 


Y,a 
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2 

. a 

= min (zo + Pk-1y) + min (San. - apro) . 
y a 


Note that if yy, solves the first min problem then zy-, = zo + Pk-1Yk-1 
minimizes ¢ over zo + span(pi,...,px-i1). The solution to the a min prob- 
lem is given by ax = pIro/pL App. Note that because of A-conjugacy, 


PRTk-1 = Pe (b— Atkı) 
pi (b — A(zo + Pk-iyk-1)) = Pk To- 


With these results it follows that £k = Zk-1 + oypy and we obtain the 
following instance of (10.2.4): 


zo = initial guess 


k=0 

To = b — AZo 

while r Z 0 
k=k+1 


Choose py € span(Api,..., Apk-i)t so pLry 1 #0. (10.2.6) 
ak = pL rk-1/ Pi Apr 
Tk = Tk-1 + OKkDk 
Tk —b— Az, 
end 


The following lemma shows that it is possible to find the search directions 
with the required properties. 


Lemma 10.2.1 Jfr, | #0, then there erists ap, € span{Ap,,..., Apy-i1) 
such that pi ry # 0. 


Proof. For the case k = 1, set pj; = rg. If k > 1, then since ry, Æ 0 it 
follows that 


A bg zo + span(pi,...,px-i) = bg Azo t span(Api,..., Apx-1) 
= TQ £ span{Ap,,..., Apx-i)- 


Thus there exists a p € span(Api,..., Apx-1) ^ such that pTro # 0. But 
Zk—ı € Zo + spaní(pi,..., px-1) and so ry-1 € ro + span(Api,..., Apy-1)- 
It follows that p'ry. ; = pTro Z0. O 


The search directions in (10.2.6) are said to be A-conjugate because 
pi Ap; = 0 for all i Æ j. Note that if P, = [pi,..., px] is the matrix of these 
vectors, then 

PI AP, = diag(pi Api... Pk Apk) 


is nonsingular since A is positive definite and the search directions are 
nonzero. It follows that Py has full column rank. This guarantees conver- 
gence in (10.2.6) in at most n steps because £n (if we get that far) minimizes 
(x) over ran(P,) = R”. 
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10.2.4 Choosing a Best Search Direction 


A way to combine the positive aspects of steepest descent and A-conjugate 
searching is to choose p, in (10.2.6) to be the closest vector to ry. that is 
A-conjugate to pi,...,pk-i. This defines "version zero” of the method of 
conjugate gradients: 


zo = initial guess 


k=0 
To = b— Arg 
while r, #0 
k=k+1 
ifk = 
Di =To 
else (10.2.7) 
Let py minimize || p — rk-1ı ||2 over all vectors 
p € span(Api, ..., Api] * 
end 


ak = PETK-1/ PL Apk 
Tk = Tk-1 + OkDk 
Tk = b— Az, 

end 

T = Tk 


To make this an effective sparse Ax = b solver, we need an efficient method 
for computing py. A considerable amount of analysis is required to develop 
the final recursions. The first step is to show that py is the minimum 
residual of a certain least squares problem. 


Lemma 10.2.2 For k > 2 the vectors p, generated by (10.2.7) satisfy 
Pk = Tr-1 — ÅPk-12k-1, 
where Py. = [P1, - --, py 1] and z,_, solves min i | rk—1 — APx-12 llo 
zé€R- 


Proof. Suppose z,_1 solves the above LS problem and let p be the associ- 
ated minimum residual: 


P = Tk-1 — APy aya. 


It follows that pT AP,_; — 0. Moreover, p = [I — (AP, 1)(( AB i)*]r.-1 
is the orthogonal projection of r,_ into ran( AP&. 1) and so it is the clos- 
est vector in ran(AP,_1)* to rk-1ı. Thus, p= py. O 
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With this result we can establish a number of important relationships be- 

tween the residuals ry, the search directions pg, and the Krylov subspaces 
K(ro, A, k) = span{ro, Aro, ..., A"! ro). 

Theorem 10.2.3 After k iterations in (10.2.7) we have 


Tk = Tr. — GKÁDK (10.2.8) 
Pir, = 0 (10.2.9) 
span{p,...,Pe} = span{ro,...,Tr-1} = K(ro,A,k) (10.2.10) 
and the residuals ro, ... ,ry are mutually orthogonal. 


Proof. Equation (10.2.8) follows by applying A to both sides of £k = 
Zk-1 + App, and using the definition of the residual. 

To prove (10.2.9), we recall that ry = ro + Pry, where yx is the mini- 
mizer of 


d(zo + Py) = O00) + Su" (PE APo)y — y" Pe(b— Azo). 


But this means that yy solves the linear system (PT AP,)y = PI (b — Azo). 
Thus 


0 = PI(b— Azo) — PLAP,y, = PI (b — A(zo + Pkyk)) = PLI ry. 
To prove (10.2.10) we note from (10.2.8) that 


{Api,.-., Apk-1) € span{ro,...,Tr-1} 


and so from Lemma 10.2.2, 


Dk = Tk-1 — [An -. -, Apx- 1] Zk-) € span{ro,... Tk-1] 
It follows that 
pi, Ue Fx] = [ro;. E ,Tk-1] T 
for some upper triangular T. Since the search directions are independent, 
T is nonsingular. This shows 
span{pi,...,Pk} = spaniro,...,rk-i)- 
Using (10.2.8) we see that 
Tk € span(ry-1, Apk} € span(ri 1, Aro, ...,ArK-1}. 


The Krylov space connection in (10.2.10)follows from this by induction. 

Finally, to establish the mutual orthogonality of the residuals, we note 
from (10.2.9) that rą is orthogonal to any vector in the range of Py. But 
from (10.2.10) this subspace contains ro,...,Try—1. O 


Using these facts we next show that px is a simple linear combination 
of its predecessor p,_, and the “current” residual r,..,. 
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Corollary 10.2.4 The residuals and search directions in (10.2.7) have the 
property that pk € span{pp-1,Tk-1} for k > 2. 


Proof. If k = 2, then from (10.2.10) pp € span{ro,ri}. But pı = ro and 
SO p2 is a linear combination of pı and rı. 
If k > 2, then partition the vector z,_, of Lemma 10.2.2 as 


z w] k-2 
k-1 = u 1 


Using the identity Tk-1 = ry 3 — ox 1AÀpx- 1, we see that 


Pk = Tk — ÅPk-12k-1 = Tk-1 — APx-2v — pApk-ı 


( + P ) Tk-1 + $k-1 
Gk-1 


where 


u 
Sk-1 = -7 Tk-2 — APk-2W 
k-1 


span(rk-2, AP, 2w) 
span{rk-2, Api,..., Ápx-2) 


span[ri,...,Trk-2) 


IY IN M 


Because the r; are mutually orthogonal, it follows that s,_; and rk- are 
orthogonal to each other. Thus, the least squares problem of Lemma 10.2.2 
boils down to choosing u and w such that 


2 
u 
lpk l = (+) lrei lè + ll se IŽ 
Qk-1 


is minimum. Since the 2-norm of r,_2—AP,_2z is minimized by z,_2 giving 
residual px_-1, it follows that s,_1 is a multiple of px-1. Consequently, 
pk E€ span{Trk-1, pk-1} O 


We are now set to derive a very simple expression for py. Without loss 
of generality we may assume from Corollary 10.2.4 that 


Pk = Tk-i + Ékpk-i- 
Since pL. , Apk = 0 it follows that 


T 
Pk- 1ÅTk— 1 


bk = 
PE_1APk-1 


This leads us to “version 1” of the conjugate gradient method: 
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zo = initial guess 
k=0 
To = b — Azo 
while r £0 
k=k+1 
ifk=1 
Pi =To 
else 
Bk = -P41 Ark-i/ PL 1 APR-1 
Dk = Tk-1 + ÉkPk-i1 (10.2.11) 
end 
Oe = pETk-1/ Pk APk 
Tk = Tk—1 + OkDk 
Tk — b- Ag, 
end 
T= Tk 


In this implementation, the method requires three separate matrix-vector 
multiplications per step. However, by computing residuals recursively via 
Tk = Tk-1 — Qk Ápy and substituting 

TL _ifk-1 = —ak-1rk_1ÁPpk-1 (10.2.12) 
and 

T _ T 

Ti. 2Tk-2 = Ok- pk 1 APk—1 (10.2.13) 

into the formula for Bk, we obtain the following more efficient version: 


Algorithm 10.2.1 [Conjugate Gradients] If A € IR^*" is symmetric 
positive definite, b € IR", and zo € IR” is an initial guess (Azo ~ b), then 
the following algorithm computes x € IR” so Az = b. 


k=0 
To = b — Azo 
while r, #0 
k=k+1 
ifk=1 
Pı = To 
else 


Dk = Tk-1 + Bepr-1 
end 
Ok = TL aTk-i/ PL Apk 
Tk = Tk-1 + OkDk 
Tk =Tk-1 — Qk Åpk 
end 
T= Tk 
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This procedure is essentially the form of the conjugate gradient algorithm 
that appears in the original paper by Hestenes and Stiefel (1952). Note 
that only one matrix-vector multiplication is required per iteration. 


10.2.5 The Lanczos Connection 


In §9.3.1 we derived the conjugate gradient method from the Lanczos al- 
gorithm. Now let us look at the connections between these two algorithms 
in the reverse direction by “deriving” the Lanczos process from conjugate 
gradients. Define the matrix of residuals Ry € IR?*^ by 


Rx = [To Tk-1] 


and the upper bidiagonal matrix B, € IR*** by 


1 —89 0 0 
0 1 -6 

Bk = tn. 0 
: . — Bk 
Qo e 0 1 


From the equations p; = ri-1 + Bipi-1, i = 2:k, and pı = rg it follows that 
Rk = P,B,. Since the columns of P, = [pi,..., px] are A-conjugate, we 
see that RT AR, = BIdiag(pl Api,..., pL Apx) By is tridiagonal. From 
(10.2.10) it follows that if 


A = diag(po,....pk-1) Ai = || lle 


then the columns of RA! form an orthonormal basis for the subspace 
span(ro, Aro,..., A*-!rg). Consequently, the columns of this matrix are 
essentially the Lanczos vectors of Algorithm 9.3.1, i.e., 


qi = +Ti-1/Pi-1 i= Lk. 


Moreover, the tridiagonal matrix associated with these Lanczos vectors is 
given by 


Tk = ABI diag(p? Ap) BA !. (10.2.14) 


The diagonal and subdiagonal of this matrix involve quantities that are 
readily available during the conjugate gradient iteration. Thus, we can 
obtain good estimates of A's extremal eigenvalues (and condition number) 
as we generate the zy in Algorithm 10.2.1. 
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10.2.6 Some Practical Details 


The termination criteria in Algorithm 10.2.1 is unrealistic. Rounding errors 
lead to a loss of orthogonality among the residuals and finite termination 
is not mathematically guaranteed. Moreover, when the conjugate gradient 
method is applied, n is usually so big that O(n) iterations represents an 
unacceptable amount of work. As a consequence of these observations, it 
is customary to regard the method as a genuinely iterative technique with 
termination based upon an iteration maximum kmaz and the residual norm. 
This leads to the following practical version of Algorithm 10.2.1: 


x = initial guess 


k=0 
r=b—Azp 
po = |r i2 
while ( /gx > «ll b l2) ^ (E < Eas) 
k=k+1 
ifk=1 
p=r 
else (10.2.16) 
Bk = Pk-1/Pk-2 
p=T + pkp 
end 
w= Ap 


Ok = py i/plw 

r-—r-dcakp 

T=T— QkW 

pk = (Ir I 
end 


This algorithm requires one matrix-vector multiplication and 10n flops per 
iteration. Notice that just four n-vectors of storage are essential: m, r, p, 
and w. The subscripting of the scalars is not necessary and is only done 
here to facilitate comparison with Algorithm 10.2.1. 

It is also possible to base the termination criteria on heuristic estimates 
of the error A^!ry by approximating || A^! || with the reciprocal of the 
smallest eigenvalue of the tridiagonal matrix Tk given in (10.2.14). 

The idea of regarding conjugate gradients as an iterative method began 
with Reid (1971). The iterative point of view is useful but then the rate of 
convergence is central to the method's success. 


10.2.7 Convergence Properties 


We conclude this section by examining the convergence of the conjugate 
gradient iterates (zx). Two results are given and they both say that the 
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method performs well when A is near the identity either in the sense of a 
low rank perturbation or in the sense of norm. 


Theorem 10.2.5 If A= I + B is an n-by-n symmetric positive definite 
matriz and rank(B) =r then Algorithm 10.2.1 converges in at most r + 1 
steps. 


Proof. The dimension of 
span(ro, Aro, e. AK Ing) = span(ro, Bro, uu B*-!n) 


cannot exceed r +1. Since pi,...,px span this subspace and are indepen- 
dent, the iteration cannot progress beyond r + 1 steps. O 


An important metatheorem follows from this result: 


e If A is close to a rank r correction to the identity, then Algorithm 
10.2.1 almost converges after r -- 1 steps. 


We show how this heuristic can be exploited in the next section. 
An error bound of a different flavor can be obtained in terms of the 
A-norm which we define as follows: 


lwla = VuTAw. 


Theorem 10.2.6 Suppose A € IR?*" is symmetric positive definite and 
b c R If Algorithm 10.2.1 produces iterates (zy) and K = K2(A) then 


lz- zrla < 2l z— zo l4 ( 
Proof. See Luenberger (1973, p.187). O 


The accuracy of the (zx) is often much better than this theorem predicts. 
However, à heuristic version of Theorem 10.2.6 turns out to be very useful: 


e The conjugate gradient method converges very fast in the A-norm if 
K2(A) = 1. 


In the next section we show how we can sometimes convert a given Ar = b 
problem into a related AZ = b problem with A being close to the identity. 


Problems 


P10.2.1 Verify that the residuals in (10.2.1) satisfy rr; = 0 whenever j = i + 1. 
P10.2.2 Verify (10.2.2). 

P10.2.3 Verify (10.2.3). 

P10.2.4 Verify (10.2.12) and (10.2.13). 
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P10.2.5 Give formula for the entries of the tridiagonal matrix Ty in (10.2.14). 


P10.2.6 Compare the work and storage requirements associated with the practical im- 
plementation of Algorithms 9.3.1 and 10.2.1. 

P10.2.7 Show that if A € R"*" is symmetric positive definite and has k distinct eigen- 
values, then the conjugate gradient method does not require more than k + 1 steps to 
converge. 


P10.2.8 Use Theorem 10.2.6 to verify that 


k 
&—1 - 
lar- Aola ev (21) i| zo- A tb fi. 


Notes and References for Sec. 10.2 
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Linear Systems,” J. Res. Nat. Bur. Stand. 49, 409-36. 
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Berlin. 
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O. Axelsson (1977). “Solution of Linear Systems of Equations: Iterative Methods,” in 
Sparse Matriz Techniques: Copenhagen, 1976, ed. V.A. Barker, Springer-Verlag, 
Berlin. 


For a discussion of conjugate gradient convergence behavior, see 
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A. van der Sluis and H.A. Van Der Vorst (1986). “The Rate of Convergence of Conjugate 
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The idea of using the conjugate gradient method as an iterative method was first dis- 
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J.K. Reid (1971). * On the Method of Conjugate Gradients for the Solution of Large 
Sparse Systems of Linear Equations,” in Large Sparse Sets of Linear Equations , ed. 
J.K. Reid, Academic Press, New York, pp. 231-54. 
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arithmetic. See 


H. Wozniakowski (1980). “Roundoff Error Analysis of a New Class of Conjugate Gradient 
Algorithms,” Lin. Alg. and Its Applic. 29, 
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lanczos and Conjugate Gradient Computations,” SIAM J. Matriz Ana. Applic. 13, 
121-137. 
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G.W. Stewart (1975). "The Convergence of the Method of Conjugate Gradients at 
Isolated Extreme Points in the Spectrum," Numer. Math. 24, 85-93. 

A. Jennings (1977). "Influence of the Eigenvalue Spectrum on the Convergence Rate of 
the Conjugate Gradient Method,” J. Inst. Math. Applic. 20, 61-72. 
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Based on Conjugate Gradient Optimization,” Lin. Alg. and Its Applic. 29, 63-90. 


Finally, we mention that the method can be used to compute an eigenvector of a large 
sparse symmetric matrix: 


A. Ruhe and T. Wiberg (1972). “The Method of Conjugate Gradients Used in Inverse 
Iteration,” BIT 12, 543-54. 


10.3 Preconditioned Conjugate Gradients 


We concluded the previous section by observing that the method of con- 
jugate gradients works well on matrices that are either well conditioned or 
have just a few distinct eigenvalues. (The latter being the case when A is 
a lower rank perturbation of the identity.) In this section we show how to 
precondition a linear system so that the matrix of coefficients assumes one 
of these nice forms. Our treatment is quite brief and informal. Golub and 
Meurant (1983) and Axelsson (1985) have more comprehensive expositions. 


10.3.1 Derivation 


Consider the n-by-n symmetric positive definite linear system Az = b. The 
idea behind preconditioned conjugate gradients is to apply the “regular” 
conjugate gradient method to the transformed system 


Az = b, (10.3.1) 


where A = C-1AC-!, à = Cz, b = C^ !b, and C is symmetric positive 
definite. In view of our remarks in §10.2.8, we should try to choose C 
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so that A is well conditioned or a matrix with clustered eigenvalues. For 
reasons that will soon emerge, the matrix C? must also be “simple.” 
If we apply Algorithm 10.2.1 to (10.3.1), then we obtain the iteration 


k=0 


Žo = initial guess (Azo z= b) 
To -b-— AZo 
while 7, £ 0 


end 


k=k+1 


else (10.3.2) 
By = Fh _yFk—-1/Fh_ oF k—2 
Pk = Tk-1 + Bkpk-a1 

end 

ay = FL Tk [BEC AC! je 

Žk = fk i-F OkDk 

Tk = fk-1 — axC ^1 AC- i, 


£= k 


Here, £4 should be regarded as an approximation to £ and řķ is the residual 
in the transformed coordinates, i.e., Fk = 5—AZ,. Of course, once we have $ 
then we can obtain x via the equation z = C^ !z. However, it is possible to 
avoid explicit reference to the matrix C^! by defining py = Cpy, Zp = Cz, 
and * = C^ ry. Indeed, if we substitute these definitions into (10.3.2) and 
recall that b = C-1b and $ = Cz, then we obtain 


k=0 


Zo = initial guess (Azo ~ b) 
To = b — Azo 
while C^!r, £0 


end 


k=k+1 
ifk=1 
Cp, = C-!rg 
else (10.3.3) 


Br = (C7 !rk-1)T (C7! rk-1)/(C 7! rk-2)T (C^! rk-2) 
Cpr = C7!ry + BkCpk-i 
end 
ay = (C7 !ry 1)T (C^! rj 1)/(Cpk)T (C71 AC") (Cpr) 
Czy = CTk-1 + aC pe 
C7 rp, = C7!ry 1 — ax(C^1 AC Cp, 


Cz = Caz, 
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If we define the preconditioner M by M = C? (also positive definite) and 
let z; be the solution of the system Mz, = ry then (10.3.3) simplifies to 


Algorithm 10.3.1 [Preconditioned Conjugate Gradients] Given a 
symmetric positive definite A € IR"*", b € IR^, a symmetric positive def- 
inite preconditioner M, and an initial guess zo (Azo œ~ b), the following 
algorithm solves the linear system Az = b. 


k=0 
To = b— Azo 
while (rj z 0) 
Solve M zy = ry. 


k=k+1 
ifk=1 

Pı = 20 
else 


gk = TE AZk-A/ TR 22k-2 
pk = Zk-1 BkPk-1 
end 
Ok = TL 1 2k-1/ Pk Ak 
Tk = Tk-1 + OkDk 
Tk =Tk-1 — Gk Ápk 
end 
T= Tk 


A number of important observations should be made about this procedure: 


e It can be shown that the residuals and search directions satisfy 
ryM or, = 0 iFj (10.3.4) 
pi (CTAC) )p = 0 iF jj (10.3.5) 


e The denominators TL 22k-2 = au M Zk—2 never vanish because M 
is positive definite. 


e Although the transformation C figured heavily in the derivation of the 
algorithm, its action is only felt through the preconditioner M = C7. 


e For Algorithm 10.3.1 to be an effective sparse matrix technique, linear 
systems of the form Mz — r must be easily solved and convergence 
must be rapid. 


The choice of à good preconditioner can have a dramatic effect upon the 
rate of convergence. Some of the possibilities are now discussed. 
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10.3.2 Incomplete Cholesky Preconditioners 


One of the most important preconditioning strategies involves computing an 
incomplete Cholesky factorization of A. The idea behind this approach is 
to calculate a lower triangular matrix H with the property that H has some 
tractable sparsity structure and is somehow "close" to A's exact Cholesky 
factor G. The preconditioner is then taken to be M = H HT. To appreciate 
this choice consider the following facts: 


e There exists a unique symmetric positive definite matrix C such that 
M =C?. 
e There exists an orthogonal Q such that C = QHT, i.e., HT is the 
upper triangular factor of a QR factorization of C. 
We therefore obtain the heuristic 
Cc !zcTAC? (10.3.6) 
(HQT)" A(QHT)! = Q(H !GGT H7 T)QT ~ I 
Thus, the better H approximates G the smaller the condition of A, and the 
better the performance of Algorithm 10.3.1. 
An easy but effective way to determine such a simple H that approxi- 
mates G is to step through the Cholesky reduction setting hij; to zero if the 


corresponding a;; is zero. Pursuing this with the outer product version of 
Cholesky we obtain 


A 


for k = 1:n 
A(k,k) = VA(k, k) 
fori=k+1:n 
if A(i,k) #0 
A(i,k) = A(i,k)/A(k, k) 
end 
end (10.3.7) 
for j = k + 1:n 
for i = j:n 
if A(i, j) #0 
A(i,j) = A(53) — ASK) AG, k) 
end 
end 
end 
end 


In practice, the matrix A and its incomplete Cholesky factor H would 
be stored in an appropriate data structure and the looping in the above 
algorithm would take on a very special appearance. 

Unfortunately, (10.3.7) is not always stable. Classes of positive definite 
matrices for which incomplete Cholesky is stable are identified in Manteuffel 
(1979). See also Elman (1986). 
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10.3.3 Incomplete Block Preconditioners 


As with just about everything else in this book, the incomplete factoriza- 
tion ideas outlined in the previous subsection have a block analog. We 
illustrate this by looking at the incomplete block Cholesky factorization of 
the symmetric, positive definite, block tridiagonal matrix 


A ET 0 
A = Ei Ag ET 
0 E» Aa 


For purposes of illustration, we assume that the A; are tridiagonal and the 
Ej are diagonal. Matrices with this structure arise from the standard 5- 
point discretization of self-adjoint elliptic partial differential equations over 
a two-dimensional domain. 

The 3-by-3 case is sufficiently general. Our discussion is based upon 
Concus, Golub, and Meurant (1985). Let 


G 0 0 
G = F G2 0 
0 Fy G3 


be the exact block Cholesky factor of A. Although G is sparse as a block 
matrix, the individual blocks are dense with the exception of G4. This can 
be seen from the required computations: 


G,GT = Bi = Ay 


F EG 
GG} = By = Ag- FF] = A.—E,By!ET 
F, = EG 


GG] = By = A- PF? = As — E;B5 ET 


We therefore seek an approximate block Cholesky factor of the form 


. [& 90 
G = F Go 0 
0 Fy G3 


so that we can easily solve systems that involve the preconditioner M = 
GGT. This involves the imposition of sparsity on G’s blocks and here is 
a reasonable approach given that the A; are tridiagonal and the E; are 
diagonal: 
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G,GT = Bi = Aj 


Ñ = E 3l 

GoGP = Bo = A- EMET, A, (tridiagonal) ~ By! 
F, = EG 

G3GP = B4 = A- EMET, Az (tridiagonal) ~ By! 


Note that all the B, are tridiagonal. Clearly, the A; must be carefully 
chosen to ensure that the B; are also symmetric and positive definite. It 
then follows that the G; are lower bidiagonal. The F; are full, but they 
need not be explicitly formed. For example, in the course of solving the 
system Mz — r we must solve a system of the form 


Gi 0 0 ui Ti 
F Go 0 ws = T2 
0 F5 G3 w3 T3 


Forward elimination can be used to carry out matrix-vector products that 
involve the F; = EG; 


Gui = n 
Gow? = r2 — Fiw = r2 — EĞ w 
G3w3 = T3— Fow =fr3— E2Gz we 


The choice of A; is delicate as the resulting B; must be positive definite. 
As we have organized the computation, the central issue is how to approx- 
imate the inverse of an m-by-m symmetric, positive definite, tridiagonal 
matrix T = (55) with a symmetric tridiagonal matrix A. There are several 
reasonable approaches: 


e Set A = diag(1/ty,...,1/tnn)- 


e Take A to be the tridiagonal part of T-!. This can be efficiently 
computed since there exist u,v € IR" such that the lower triangular 
part of T^! is the lower triangular part of uvT. See Asplund(1959). 


e Set A = UTU where U is the lower bidiagonal portion of G^! where 
T = GG? is the Cholesky factorization. This can be found in O(m) 
flops. 


For a discussion of these approximations and what they imply about the 
associated preconditioners, see Concus, Golub, and Meurant (1985). 
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10.3.4 Domain Decomposition Ideas 


The numerical solution of elliptic partial differential equations often leads 
to linear systems of the form 


Ay + e By zi di 

> Ag Bo T2 dz 
: : = : (10.3.8) 

Ap Bp Tp dp 

BI BZ .. BT Q z Í 


if the unknowns are properly sequenced. See Meurant (1984). Here, the 
A; are symmetric positive definite, the B; are sparse, and the last block 
column is generally much narrower than the others. 

An example with p = 2 serves to connect (10.3.8) and its block structure 
with the underlying problem geometry and the chosen domain decomposi- 
tion. Suppose we are to solve Poisson’s equation on the following domain: 


+ 
+ 
+ 
+ 
+ 
+ 
+ 
x 
x 
x 
x 
x 
x 


XOX KKK e+ tttttt 
xx KKK et ttte tt 
KKKK Ke +tttttet 
XOXOX OX X e+ ttt G 9 94 4 
Kx xx X ox bb cb dcbct tt 
xxx KX Ke ttt+e tt 
KKK KK e+ tttt+ tt 
KKKK Ke ++ ttettt 


With the usual discretization, an unknown at a mesh point is coupled only 
to its “north”, “east”, “south”, and “west” neighbor. There are three 
“types” of variables: those interior to the top subdomain (aggregated in 
the subvector zı and associated with the “+” mesh points), those interior 
to the bottom subdomain (aggregated in the subvector z2 and associated 
with the "x" mesh points), and those on the interface between the two 
subdomains (aggregated in the subvector z and associated with the “*” 
mesh points) Note that the interior unknowns of one subdomain are not 
coupled to the interior unknowns of another subdomain, which accounts 
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for the zero blocks in (10.3.8). Also observe that the number of interface 
unknowns is typically small compared to the overall number of unknowns. 

Now let us explore the preconditioning possibilities associated with 
(10.3.8). We continue with the p — 2 case for simplicity. If we set 


Mj! 0 0 
M-L 0 Mj! oj|LT 
0 o S^! 
where 
Mi 0 0 
L = oM O0 
BT B S 
then 
Mı 0 B 
M = 0 M Be (10.3.9) 
BT Bj S, 


with S, = S + BT M, !B, + BT M; Bs. Let us consider how we might 
choose the block parameters Mi, M», and S so as to produce an effective 
preconditioner. 

If we compare (10.3.9) with the p — 2 version of (10.3.8) we see that it 
makes sense for M; to approximate A; and for S, to approximate Q. The 
latter is achieved if S ~ Q — BT M, | B1 — BT Mj B; . There are several 
approaches to selecting S and they all address the fact that we cannot form 
the dense matrices B;M;!B] . For example, as discussed in the previous 
subsection, tridiagonal approximations of the M;! could be used. See 
Meurant (1989). 

If the subdomains are sufficiently regular and it is feasible to solve linear 
systems that involve the A; exactly (say by using a fast Poisson solver), then 
we can set M; = A,. It follows that M = A+ E where the rank(E) = m 
with m being the number of interface unknowns. Thus, the preconditioned 
conjugate gradient algorithm would theoretically converge in m + 1 steps. 

Regardless of the approximations that must be incorporated in the pro- 
cess, we see that there are significant opportunities for parallelism because 
the subdomain problems are decoupled. Indeed, the number of subdomains 
p is usually a function of both the problem geometry and the number of 
processors that are available for the computation. 


10.3.5 Polynomial Preconditioners 


The vector z defined by the preconditioner system Mz = r should be 
thought of as an approximate solution to Az = r insofar as M is an ap- 
proximation of A. One way to obtain such an approximate solution is to 
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apply p steps of a stationary method M,z(*t)) = Niz) +r, z% = 0. It 
follows that if G = My1N, then 


z = 2) =(14+G4---G?)My!r. 


Thus, if M^! = (I + G+---G?-1)M;? then Mz = r and we can think 
of M as a preconditioner. Of course, it is important that M be symmetric 
positive definite and this constrains the choice of Mı, Ni, and p. Because 
M is a polynomial in G it is referred to as a polynomial preconditioner. 
This type of preconditioner is attractive from the vector/parallel point of 
view and has therefore attracted considerable attention. 


10.3.6 Another Perspective 


The polynomial preconditioner discussion points to an important connec- 
tion between the classical iterations and the preconditioned conjugate gra- 
dient algorithm. Many iterative methods have as their basic step 


Ik = Tk-2 + we(YeZe-1 + Tk-i — Tk-2) (10.3.10) 


where Mzķ-1 = re-1 = b — Axg_,. For example, if we set wą = 1, and 
Yk = 1, then 
zy = M7'(b— Azy-1) + 2-1, 


ie. Mz, = Nr, ,-c b, where A = M — N. Thus, the Jacobi, Gauss- 
Seidel, SOR, and SSOR methods of $10.1 have the form (10.3.10). So also 
does the Chebyshev semi-iterative method (10.1.12). 

Following Concus, Golub, and O'Leary (1976), it is also possible to 
organize Algorithm 10.3.1 with a central step of the form (10.3.10): 


Z_, =0; To = initial guess; k = 0; ro = b — Azo 
while Tk X 0 

k=k+1 

Solve Mzp_1 = ry-1 for Zķ-1. 


Yr-1 = 24 Mz. af 14k 

ifk=1 (10.3.11) 
w= 1 

else 


T -1 
= (1 X1 fRAMzEBa 0d 
wk ~ Yk-2 
zķ-2M2Zk-2 Wk-1 

end 

Tk = Zk-2 + Uk(Yyk-iZk-i-d- Tk-1 — Tk-2) 

Tk = b — AZ, 
end 
T= Tn 
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Thus, we can think of the scalars wy and yp in (10.3.11) as acceleration 
parameters that can be chosen to speed the convergence of the iteration 
Ma, = Nzy.i +b. Hence, any iterative method based on the splitting 
A= M -N can be accelerated by the conjugate gradient algorithm as long 
as M (the preconditioner) is symmetric and positive definite. 


Problems 


P10.3.1 Detail an incomplete factorization procedure that is based on gaxpy Cholesky, 
i.e., Algorithm 4.2.1. 


P10.3.2 How many n-vectors of storage is required by a practical implementation of 
Algorithm 10.3.1? Ignore workspaces that may be required when Mz — r is solved. 
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10.4 Other Krylov Subspace Methods 


The conjugate gradient method presented over the previous two sections 
is applicable to symmetric positive definite systems. The MINRES and 
SYMMLQ variants developed in §9.3.2 in connection with the symmetric 
Lanczos process can handle symmetric indefinite systems. Now we push 
the generalizations even further in pursuit of iterative methods that are 
applicable to unsymmetric systems. 

The discussion is patterned after the survey article by Freund, Golub, 
and Nachtigal (1992) and Chapter 9 of Golub and Ortega (1993). We focus 
on cg-type algorithms that involve optimization over Krylov spaces. 
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Bear in mind that there is a large gap between our algorithmic speci- 
fications and production software. A good place to build an appreciation 
for this point is the Templates book by Barrett et al (1993). The book by 
Saad (1996) is also highly recommended 


10.4.4 Normal Equation Approaches 


The method of normal equations for the least squares problem is appealing 
because it allows us to use simple "Cholesky technology" instead of more 
complicated methods that involve orthogonalization. Likewise, in the un- 
symmetric Az = b problem it is tempting to solve the equivalent symmetric 
positive definite system 

AT Az = AT b 


using existing conjugate gradient technology. Indeed, if we make the sub- 
stitution A — AT A in Algorithm 10.2.1 and note that a normal equation 
residual ATb — AT Az, is AT times the "true" residual b — Azp, then we 
obtain the Conjugate Gradient Normal Equation Residual method: 


Algorithm 10.4.1 [CGNR] If A € IR"** is nonsingular, b € IR^, and 
Zo € IR” is an initial guess (Arp z b), then the following algorithm com- 
putes z € IR” so Az = b. 


k=0 
To = b — Azo 
while r #0 
k=k+1 
ifk=1 
pi = ATro 
else 


Pr = (ATry 3)T (ATry 3) /(ATrI  2)T (ATrI 2) 
Pk = ATre_1 + Bkpk-i 
end 
ox = (AT ry 3)! (AT ry .1)/ (Apc) (Apk) 
Tk = Tk-1 + OkDk 
Tk — Tk-1 — OAD, 
end 
T = Tk 


Another way to make an unsymmetric Az = b problem “cg-friendly” is to 
work with the system 


AATy =b z= Aly, 


In “y space" the cg algorithm takes on the following form: 
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k=0 
Yo = initial guess (AAT yo = b) 
ro = b — AAT yo 
while ry #0 

k=k+1 

ifk = 

Pı = To 
else 


Bk = TF _yrk-1/Th_oTk-2 
Pk = Tk-1 + PePk-1 
end 
Ok = TL ark-i/ pT AAT p, 
Yk = Yk-1 + QkPk 
Tk =Tk-1 — QkAAT pk 
end 
y = Yk 


Making the substitutions zy — ATy, and py — A’ p, and simplifying we 
obtain the Conjugate Gradient Normal Equation Error method: 


Algorithm 10.4.2 [CGNE] If A € IR^"" is nonsingular, b € IR^, and 


zo € IR” is an initial guess (Arp ~ b), then the following algorithm com- 
putes z € R” so Az = b. 


k=0 
To = b — Azo 
while r #0 
k=k+4+1 
ifk=1 
pi = A™ro 
else 


Bk = TL ark a /TI ark-2 
Pk = ATTk-1  Bkpk-a 
end 
Qk = TI aTk-i/PLDk 
Ty = Tk-1 + OkDk 
Tk — Tk) — Gk ÁDK 
end 
T = Tk 


In general these two normal equation approaches are handicapped by the 
squaring of the condition number. (Recall Theorem 10.2.6.) However, 
there are some occasions where they are effective and we refer the reader 
to Freund, Golub and Nachtigal (1991). 
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10.4.2 A Note on Objective Functions 


Based on what we know about the cg method, the CGNR iterate zy mini- 
mizes 


1 
x(x) = 52 (AT A)a — 2T AT 
over the set 


It is easy to show that 
1 1 
5l ^ - Az là = éi(z) + PAL 


and so x, minimizes the residual || b — Az ||, over SCON®), The “R” in 
“CGNR” is there because of the residual-based optimization. 
On the other hand, the CGNE (implicit) iterate yy minimizes 


1 
d2(y) = 3 (A4 )y - yTb 


over the set yo + K(AAT,b — AAT yg, k). With the change of variable z = 
AT y it can be shown that zy minimizes 


1 1 1 
ste A" bo Bll - Abn + zl Al 


over 
SCN = xy K(AT A, AT ro, k). (10.4.1) 


Thus CGNE minimizes the error at each step and that explains the “E” in 
“CGNE”. 


10.4.8 The Conjugate Residual Method 


Recall that if A is symmetric positive definite, then it has a symmetric 
positive definite square root A!/?, (See §4.2.10.) Note that in this case 
Az = b and 


Ag = A~V25 


are equivalent and that the former is the normal equation version of the 
latter. If we apply CGNR to this square root system and simplify the 
results, then we obtain 


Algorithm 10.2.3 [Conjugate Residuals] If A € IR"** is symmetric 
positive definite, b € IR", and zo € IR" is an initial guess (Azo ~ b), then 
the following algorithm computes x € IR^ so Az = b. 
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k=0 
Tg = b — Azo 
while r #0 


k=k+1 
ifk=1 

P1=To 
else 


Bk = TP_\Arg—1/TE_2ArK-2 
Apk = Arg—1 + Be APK-1 
end 
ak = rf V Ark /(Apx)! (Apk) 
Ty = Tk-1 + OkDk 
Tk =Tk-1 ~ Ok ÁDK 
end 
T = Tk 


It follows from our comments about CGNR that || A~!/?(b — Az) ||, is min- 
imized over the set zo + K(A, ro, k) during the kth iteration 


10.4.4 GMRES 


In 89.3.2 we briefly discussed the Lanczos-based MINRES method for sym- 
metric, possibly indefinite, Ar = b problems. In that method the iterate 
zy minimizes || b — Az ||, over the set 


S, = Zo + span(ro, Aro,..., AF ! rg) = zo + K(A, ro, k) (10.4.2) 


The key idea behind the algorithm is to express £x in terms of the Lanczos 
vectors q1, Q2, ..., qy Which span K(A,ro, k) if qı is a multiple of the initial 
residual rg — b — Azo. 

In the Generalized Minimum Residual (GMRES) method of Saad and 
Schultz (1986) the same approach is taken except that the iterates are 
expressed in terms of Arnoldi vectors instead of Lanczos vectors in order 
to handle unsymmetric A. After k steps of the Arnoldi iteration (9.4.1) we 
have the factorization 

AQk = Qk41 Hk (10.4.3) 


where the columns of Qy41 = [Qk qx41] are the orthonormal Arnoldi vec- 
tors and 


ha hz cee hik 
ha dme e hay 

A, = 0 e : c IREt xk 
O oo orn Roi Pk 


O vee wee 0 hes ik 
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is upper Hessenberg. In the kth step of GMRES, || b — Az; ||; is minimized 
subject to the constraint that zy, has the form xz, = To + Qxyx for some 
yk € IR*. If q1 = ro/po where po = || ro ||, then it follows that 


| 5— A(zo + Qxyk) ll. = ll to — AQkux lla 
= || ro — Qr+ Hx |l 
= ll po€1 — Hkyk lz 


Thus, yy is the solution to a (k + 1)-by-k least squares problem and the 
GMRES iterate is given by zy = To + QkYk . 


Algorithm 10.4.4 [GMRES] If A € IR"** is nonsingular, b € IR", and 
zo € IR" is an initial guess (Azo z b), then the following algorithm com- 
putes z € IR" so Az = b. 


To = b — Azo 
hio = || ro lle 
k=0 


while (hk+1,k > 0) 
qdk+1 = Tk/ hkai,k 


k=k+1 
Tk = Áqk 
for i = 1:k 
hik =q] Te 
Tk = Tk — hikqi 
end 
hei = || rk lle . 
Tk = To + Qkyk where || 1061 — Axyx ||, = min 
end 
T = Tk 


It is easy to verify that 
| b — Azk ||; — heu 


The upper Hessenberg least square problem can be efficiently solved using 
Givens rotations. In practice there is no need to form zx until one is happy 
with its residual. 

The main problem with “unlimited GMRES" is that the kth iteration 
involves O(kn) flops. Thus like Arnoldi, à practical GMRES implementa- 
tion requires a restart strategy to avoid excessive amounts of computation 
and memory traffic. For example, if at most m steps are tolerable, then £m 
can be used as the initial vector for the next GMRES sequence. 
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10.4.5 Preconditioning 


Preconditioning is the other key to making GMRES effective. Analogous 
to the development of the preconditioned conjugate gradient method in 
§10.3, we obtain a nonsingular matrix M = M,Mp that approximates A 
in some sense and then apply GMRES to the system Az = b where A = 
M;!AM,', b = Mj b, and @ = Moz. If we write down the GMRES 
iteration for the tilde system and manipulate the equations to restore the 
original variables, then the resulting iteration requires the solution of linear 
systems that involve the preconditioner M. Thus, the act of finding a good 
preconditioner M = MjM^; is the act of making A= M,'AM," look 
as much as possible like the identity subject to the constraint that linear 
systems with M are easy to solve. 


10.4.6 The Biconjugate Gradient Method 


Just as Arnoldi underwrites GMRES, the unsymmetric Lanczos process 
underwrites the Biconjugate gradient (BiCG) method. The starting point 
in the development of BiCG is to go back to the Lanczos derivation of the 
conjugate gradient method in 89.3.1. In terms of Lanczos vectors, the kth 
cg iterate is given by £k = To + Qkyk where Qx is the matrix of Lanczos 
vectors, Ty = QT AQ, is tridiagonal, and y, solves Tkyk = Qi ro. Note that 


Qi (b — Azk) = Qt (ro — AQxyk) = 0. 


Thus, we can characterize zy by insisting that it come from zo + K(A, ro, k) 
and that it produce a residual that is orthogonal to a given subspace, say 
K(A, ro, k). 

In the unsymmetric case we can extend this notion by producing a se- 
quence of iterates (z&) with the property that x, belongs to zo-- K(A, ro, k) 
and produces a residual that is orthogonal to K( AT , so, k) for some so € IR". 
Simplifications occur if the unsymmetric Lanczos process is used to gener- 
ate bases for the two involved Krylov spaces. In particular, after k steps 
of the unsymmetric Lanczos algorithm (9.4.7) we have Qk, P, € IR^** such 
that PTQ, = I, and a tridiagonal matrix T, = PT AQ, such that 


AQk = QkTk 4 reef PIr,-0 


10.4.4 
ATP, PTE + SkeL QT s, =0 ( ) 


In BiCG we set zy = £o+Qkyk where Ty, = QT ro. Note that the Galerkin 
condition 
P2(b— Azp) = PE (ro - AQkyk) = 0 
holds. 
As might be expected, it is possible to develop recursions so that zy 
can be computed as a simple combination of £- and q,_1, instead of as 
a linear combination of all the previous q-vectors. 
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The BiCG method is subject to serious breakdown because of its de- 
pendence on the unsymmetric Lanczos process. However, by relying on 
a look-ahead Lanczos procedure it is possible to overcome some of these 
difficulties. 


10.4.7 QMR 


Another iteration that runs off of the unsymmetric Lanczos process is the 
quasi-minimum residual (QMR) method of Freund and Nachtigal (1991). 
As in BiCG the kth iterate has the form £k = To + Qxyx. It is easy to show 
that after k steps in (9.4.7) we have the factorization 
AQk = Qi Tk 

is tridiagonal. It follows that if q) = p(b — Azo), then 
b- Ax, = b- (zo Qxyx) 

= ro- ÁQkxyk 

= To- Qe+i Tey 

= Qk+1(e1 — Ty). 


If yk is chosen to minimize the 2-norm of this vector, then in exact arith- 
metic To + Qxyx defines the GMRES iterate. In QMR, y, is chosen to 
minimize Il pei — TkVk lz 


where Tj, e IRF*!** 


10.4.8 Summary 


The methods that we have presented do not submit to a linear ranking. 
The choice of a technique is complicated and depends on a host of factors. 
A particularly cogent assessment of the major algorithms is given in Barrett 
et al (1993). 


Problems 


P10.4.1 Analogous to (10.2.16), develop efficient implementations of the CGNR, CGNE, 
Conjugate residual methods. 


P10.4.2 Establish the mathematical equivalence of the CGNR and the LSQR method 
outlined in 89.3.4. 


P10.4.3 Prove (10.4.3). 


P10.4.4 Develop an efficient preconditioned GMRES implementation. Proceeding as 
we did in §10.3 for preconditioned conjugate gradient method. (See (10.3.2) and (10.3.3) 
in particular.) 


P10.4.5 Prove that the GMRES least squares problem has full rank. 


Notes and References for Sec. 10.4 


The following papers serve as excellent introductions to the world of unsymmetric iter- 
ation: 
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S. Eisenstat, H. Elman, and M. Schultz (1983). “Variational Iterative Methods for 
Nonsymmetric Systems of Equations,” SIAM J. Num. Anal. 20, 345-357. 

R.W. Freund, G.H. Golub, and N. Nachtigal (1992). “Iterative Solution of Linear Sys- 
tems,” Acta Numerica 1, 57-100. 

N. Nachtigal, S. Reddy, and L. Trefethen (1992). “How Fast Are Nonsymmetric Matrix 
Iterations,” SIAM J. Matriz Anal. Appl. 13, 778-795. 

A. Greenbaum and L.N. Trefethen (1994). “GMRES/CR and Arnoldi/Lanczos as Matrix 
Approximation Problems,” SIAM J. Sci. Comp. 15, 359-368. 


Krylov space methods and analysis are featured in the following papers: 


W.E. Arnoldi (1951). “The Principle of Minimized Iterations in the Solution of the 
Matrix Eigenvalue Problem," Quart. Appl. Math. 9, 17-29. 

Y. Saad (1981). "Krylov Subspace Methods for Solving Large Unsymmetric Linear 
Systems," Math. Comp. 37, 105-126. 

Y. Saad (1984). *Practical Use of Some Krylov Subspace Methods for Solving Indefinite 
and Nonsymmetric Linear Systems," SIAM J. Sci. and Stat. Comp. 5, 203-228. 

Y. Saad (1989). “Krylov Subspace Methods on Supercomputers,” SIAM J. Sci. and 
Stat. Comp. 10, 1200-1322. 

C.-M. Huang and D.P. O’Leary (1993). “A Krylov Multisplitting Algorithm for Solving 
Linear Systems of Equations," Lin. Alg. and Its Applic. 194, 9-29. 

C.C. Paige, B.N. Parlett,and H.A. Van Der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces," Numer. Linear Algebra with Applic. 2, 
115-134. 


References for the GMRES method include 


Y. Saad and M. Schultz (1986). “GMRES: A Generalized Minimal Residual Algorithm 
for Solving Nonsymmetric Linear Systems," SIAM J. Scientific and Stat. Comp. 7, 
856-869. 

H.F. Walker (1988). "Implementation of the GMRES Method Using Householder Trans- 
formations," SIAM J. Sci. Stat. Comp. 9, 152-163. 

C. Vuik and H.A. van der Vorst (1992). “A Comparison of Some GMRES-like Methods,” 
Lin. Alg. and Its Applic. 160, 131-162. 

N. Nachtigal, L. Reichel, and L. Trefethen (1992). “A Hybrid GMRES Algorithm for 
Nonsymmetric Linear Systems," SIAM J. Matriz Anal. Appl. 13, 796-825. 

Y. Saad (1993). “A Flexible Inner-Outer Preconditioned GMRES Algorithm,” SIAM J. 
Sci. Comput. 14, 461-469. 

Z. Bai, D. Hu, and L. Reichel (1994). “A Newton Basis GMRES Implementation,” IMA 
J. Num. Anal. 14, 563-581. 

R.B. Morgan (1995). “A Restarted GMRES Method Augmented with Eigenvectors,” 
SIAM J. Matriz Anal. Applic. 16, 1154-1171. 


Preconditioning ideas for unsymmetric problems are discussed in the following papers: 


Y. Saad (1988). “Preconditioning Techniques for Indefinite and Nonsymmetric Linear 
Systems,” J. Comput. Appl. Math. 24, 89-105. 

L. Yu. Kolotilina and A. Yu. Yeremin (1993). “Factorized Sparse Approximate Inverse 
Preconditioning I: Theory,” SIAM J. Matrix Anal. Applic. 14, 45-58. 

I.E. Kaporin (1994). “New Convergence Results and Preconditioning Strategies for the 
Conjugate Gradient Method,” Num. Lin. Alg. Applic. 1, 179-210. 

L. Yu. Kolotilina and A. Yu. Yeremin (1995). “Factorized Sparse Approximate Inverse 
Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers,” 
Intern. J. High Speed Comput. 7, 191-215. 

H. Elman (1996). “Fast Nonsymmetric Iterations and Preconditioning for Navier-Stokes 
Equations,” SIAM J. Sci. Comput. 17, 33-46. 
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M. Benzi, C.D. Meyer, and M. Tuma (1996). “A Sparse Approximate Inverse Precondi- 
tioner for the Conjugate Gradient Method,” SIAM J. Sci. Comput. 17, to appear. 


Some representative papers concerned with the development of nonsymmetric conjugate 
gradient procedures include 


D.M. Young and K.C. Jea (1980). “Generalized Conjugate Gradient Acceleration of 
Nonsymmetrizable Iterative Methods,” Lin. Alg. and Its Applic. 34, 159-94. 

O. Axelsson (1980). “Conjugate Gradient Type Methods for Unsymmetric and Incon- 
sistent Systems of Linear Equations,” Lin. Alg. and Its Applic. 29, 1-16. 

K.C. Jea and D.M. Young (1983). “On the Simplification of Generalized Conjugate 
Gradient Methods for Nonsymmetrizable Linear Systems,” Lin. Alg. and Its Applic. 
52/53, 399-417. 

V. Faber and T. Manteuffel (1984). “Necessary and Sufficient Conditions for the Exis- 
tence of a Conjugate Gradient Method,” SIAM J. Numer. Anal. 21 352-362. 

Y. Saad and M. Schultz (1985). “Conjugate Gradient-Like Algorithms for Solving Non- 
symmetric Linear Systems,” Math. Comp. 44, 417-424. 

H.A. Van der Vorst (1986). “An Iterative Solution Method for Solving f(A)z = b Using 
Krylov Subspace Information Obtained for the Symmetric Positive Definite Matrix 
A,” J. Comp. and App. Math. 18, 249-263. 

M.A. Saunders, H.D. Simon, and E.L. Yip (1988). “Two Conjugate Gradient-Type 
Methods for Unsymmetric Linear Equations,” SIAM J. Num. Anal. 25, 927-940. 

R. Freund (1992). “Conjugate Gradient-Type Methods for Linear Systems with Complex 
Symmetric Coefficient Matrices,” SIAM J. Sci. Statist. Comput. 13, 425-448. 


More Lanczos-based solvers are discussed in 


Y. Saad (1982). “The Lanczos Biorthogonalization Algorithm and Other Oblique Pro- 
jection Methods for Solving Large Unsymmetric Systems,” SIAM J. Numer. Anal. 
19, 485-506. 

Y. Saad (1987). “On the Lanczos Method for Solving Symmetric Systems with Several 
Right Hand Sides,” Math. Comp. 48, 651-662. 

C. Brezinski and H. Sadok (1991). “Avoiding Breakdown in the CGS Algorithm,” Nu- 
mer. Alg. 1, 199-206. 

C. Brezinski, M. Zaglia, and H. Sadok (1992). “A Breakdown Free Lanczos Type Algo- 
rithm for Solving Linear Systems,” Numer. Math. 63, 29-38. 

S.K. Kim and A.T. Chronopoulos (1991). “A Class of Lanczos-Like Algorithms Imple- 
mented on Parallel Computers,” Parallel Comput. 17, 763-778. 

W. Joubert (1992). “Lanczos Methods for the Solution of Nonsymmetric Systems of 
Linear Equations,” SIAM J. Matriz Anal. Appl. 13, 926-943. 

R.W. Freund, M. Gutknecht, and N. Nachtigal (1993). “An Implementation of the 
Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices,” SIAM J. Sci. and 
Stat.Comp. 14, 137-158. 


The QMR method is detailed in the following papers 


R.W. Freund and N. Nachtigal (1991). “QMR: A Quasi-Minimal Residual Method for 
Non-Hermitian Linear Systems,” Numer. Math. 60, 315-339. 

R.W. Freund (1993). “A Transpose-Free Quasi-Minimum Residual Algorithm for Non- 
hermitian Linear System,” SIAM J. Sci. Comput. 14, 470-482. 

R.W. Freund and N.M. Nachtigal (1994). “An Implementation of the QMR Method 
Based on Coupled Two-term Recurrences,” SIAM J. Sci. Comp. 15, 313-337. 


The residuals in BiCG tend to display erratic behavior prompting the development of 
stabilizing techniques: 
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H. van der Vorst (1992). “BiCGSTAB: A Fast and Smoothly Converging Variant of the 
Bi-CG for the Solution of Nonsymmetric Linear Systems,” SIAM J. Sci. and Stat. 
Comp. 13, 631-644. 

M. Gutknecht (1993). “Variants of BiCBSTAB for Matrices with Complex Spectrum,” 
SIAM J. Sci. and Stat. Comp. 14, 1020-1033. 

G.L.G. Sleijpen and D.R. Fokkema (1993). “BICGSTAB(é) for Linear Equations In- 
volving Unsymmetric Matrices with Complex Spectrum," Electronic Transactions 
on Numerical Analysis 1, 11-32. 

C. Brezinski and M. Redivo-Zaglia (1995). “Look-Ahead in BiCGSTAB and Other 
Product-Type Methods for Linear Systems," BIT 35, 169-201. 


In some applications it is awkward to produce matrix-vector product code for both Ax 
and AT yz. Transpose free methods are popular in this context. See 


P. Sonneveld (1989). *CGS, A Fast Lanczos-Type Solver for Nonsymmetric Linear Sys- 
tems," SIAM J. Sci. and Stat. Comp. 10, 36-52. 

G. Radicati di Brozolo and Y. Robert (1989). *Parallel Conjugate Gradient-like Algo- 
rithms for Solving Sparse Nonsymmetric Linear Systems on a Vector Multiprocessor," 
Parallel Computing 11, 233-240. 

C. Brezinski and M. Redivo-Zaglia (1994). “Treatment of Near-Breakdown in the CGS 
Algorithms," Numerical Algorithms 7, 33-73. 

E.M. Kasenally (1995). “GMBACK: A Generalized Minimum Backward Error Algorithm 
for Nonsymmetric Linear Systems," SIAM J. Sci. Comp. 16, 698-719. 

C.C. Paige, B.N. Parlett, and H.A. van der Vorst (1995). “Approximate Solutions and 
Eigenvalue Bounds from Krylov Subspaces," Num. Lin. Alg. with Applic. 2, 115- 
133. 

M. Hochbruck and Ch. Lubich (1996), *On Krylov Subspace Approximations to the 
Matrix Exponential Operator," SIAM J. Numer. Anal., to appear. 

M. Hochbruck and Ch. Lubich (1996), “Error Analysis of Krylov Method in a Nutshell,” 
SIAM J. Sci. Comput., to appear. 


Connections between the pseudoinverse of a rectangular matrix A and the conjugate 
gradient method applied to AT A are pointed out in the paper 


M. Hestenes (1975). “Pseudoinverses and Conjugate Gradients,” CACM 18, 40-43. 


Chapter 11 


Functions of Matrices 


$11.1 Eigenvalue Methods 
$11.2 Approximation Methods 
§11.3 The Matrix Exponential 


Computing a function f(A) of an n-by-n matrix A is a frequently oc- 
curring problem in control theory and other application areas. Roughly 
speaking, if the scalar function f(z) is defined on A(A), then f(A) is de- 
fined by substituting “A” for “z” in the “formula” for f(z). For example, 
if f(z) 2 (1 z)/(0— z) and 1 ¢ (A), then f(A) = (I+ AXI- AJ}. 

The computations get particularly interesting when the function f is 
transcendental. One approach in this more complicated situation is to 
compute an eigenvalue decomposition A = Y BY ^! and use the formula 
f(A) = Yf(B)Y~!. If B is sufficiently simple, then it is often possible 
to calculate f(B) directly. This is illustrated in 811.1 for the Jordan and 
Schur decompositions. Not surprisingly, reliance on the latter decomposi- 
tion results in a more stable f(A) procedure. 

Another class of methods for the matrix function problem is to approx- 
imate the desired function f(A) with an easy-to-calculate function g(A). 
For example, g might be a truncated Taylor series approximate to f. Error 
bounds associated with the approximation of matrix functions are given in 
811.2. 

In the last section we discuss the special and very important problem 
of computing the matrix exponential e^. 


Before You Begin 


Chapters 1, 2, 3, 7 and 8 are assumed. Within this chapter there are 
the following dependencies: 
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§11.1 — §11.2 — 811.3 


Complementary references include Mirsky (1955), Gantmacher (1959), Bell- 
man (1969), and Horn and Johnson (1991). Some Matlab functions impor- 
tant to this chapter are expm, expm1, expm2, expm3, logm, sqrtm, and funn. 


11.1 Eigenvalue Methods 


Given an n-by-n matrix A and a scalar function f(z), there are several 
ways to define the matrix function f(A). A very informal definition might 
be to substitute “A” for “z” in the formula for f(z). For example, if p(z) 
= 1 + z and r(z) = (1 — (z/2))71(1 + (z/2)) for z Æ 2, then it is certainly 
reasonable to define p(A) and r(A) by 


p(A) = I+A 


r(A) = (-2) (+4) 2¢X(A). 


* A-for-z" substitution also works for transcendental functions, i.e., 


and 


To make subsequent algorithmic developments precise, however, we need a 
more precise definition of f(A). 


11.1.1 A Definition 


There are many ways to establish rigorously the notion of a matrix function. 
See Rinehart (1955). Perhaps the most elegant approach is in terms of a 
line integral. Suppose f(z) is analytic inside on a closed contour I which 
encircles A(A). We define f(A) to be the matrix 


f(A) = zz PLT Aye (11.1.1) 


This definition is immediately recognized as a matrix version of the Cauchy 
integral theorem. The integral is defined on an element-by-element basis: 


F(A) = (fu) = fih = za PPE (al - AY eye. 


Notice that the entries of (zI — A)~! are analytic on I and that f(A) is 
defined whenever f(z) is analytic in a neighborhood of A(A). 
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11.1.2 The Jordan Characterization 


Although fairly useless from the computational point of view, the definition 
(11.1.1) can be used to derive more practical characterizations of f(A). For 
example, if f(A) is defined and 


A = XBX^! = Xdiag(B,,..., Bj)X |, B; e Qn 
then it is easy to verify that 
f(A) = Xf(B)X^! = Xdiag(f(Bi),..., f(B,))X |. (11.1.2) 
For the case when the B; are Jordan blocks we obtain the following: 


Theorem 11.1.1 Let X^! AX = diag(Ji,...,Jp) be the Jordan canonical 
form (JCF) of A € €"*? with 


A loe |) 
0 AX 1 

Ji = *. 
: : E "1l 
Q ss 0 X 


being an mj-by-m, Jordan block. If f(z) is analytic on an open set contain- 
ing A(A), then 


f(A) = Xdiag(f(J),..., f(5) X ! 
where 


fo». 


f(N) f? (A) (m; _ 1)! 


FOO) 
0 e eee fs) 


Proof. In view of the remarks preceding the statement of the theorem, it 
suffices to examine f(G) where 


G = AT + E E = (6:,3-1) 
is a g-by-q Jordan block. Suppose (zI — G) is nonsingular. Since 


I og =F E* 
1-0)" = 9 p 


k=0 
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it follows from Cauchy’s integral theorem that 


ipai f £1 OA 
f(G) = D mfi EX = y Os. 


k=0 k=0 


The theorem follows from the observation that E* = (6;,j-x). O 


Corollary 11.1.2 If A € C"*", A = Xdiag(i,...,44)X^!, and f(A) is 
defined, then 
f(A) = Xdiag(f(u),..., £0))X 7. 


Proof. The Jordan blocks are all 1-by-1. O 


These results illustrate the close connection between f(A) and the eigen- 
system of A. Unfortunately, the JCF approach to the matrix function 
problem has dubious computational merit unless A is diagonalizable with 
a well-conditioned matrix of eigenvectors. Indeed, rounding errors of order 
u&2(X) can be expected to contaminate the computed result, since a lin- 
ear system involving the matrix X must be solved. The following example 
suggests that ill-conditioned similarity transformations should be avoided 
when computing a function of a matrix. 


Example 11.1.1 If 


[141075 1 
A= | 0 110-5 |: 


then any matrix of eigenvectors is a column scaled version of 


1 —1 
x= | 0 2(1-— 1075) | 
and has a 2-norm condition number of order 109. Using à computer with machine 
precision u = 1077 we find 


-lal -5 10-5 _ 2.718307 2.750000 
fi[X~ diag(exp(1 + 107°), exp(1— 107?))X] = | 2.000000 2718254 


while 
eA = 2.718309 2.718282 
~ 0.000000 2.718255 ` 


11.1.3 A Schur Decomposition Approach 


Some of the difficulties associated with the Jordan approach to the matrix 
function problem can be circumvented by relying upon the Schur decom- 
position. If A = QTQ¥ is the Schur decomposition of A, then 


f(A) = QF(T)Q*. 


For this to be effective, we need an algorithm for computing functions of 
upper triangular matrices. Unfortunately, an explicit expression for f(T) 
is very complicated as the following theorem shows. 
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Theorem 11.1.3 Let T = (tij) be an n-by-n upper triangular matriz with 
Ai = tu and assume f (T) is defined. If f(T) = (fij), then fij =0 ifi» j, 
fü = f(X) for i = j, and for all i < j we have 


fi = > teo,81¢81,82 ttis, asad Asor As] ’ 
(30,..,84)€ Sij 


where Si; is the set of all strictly increasing sequences of integers that start 
at i and end at j and f [Asy,..., As,] is the kth order divided difference of 
f at (55,...,5.,)- 


Proof. See Descloux (1963), Davis (1973), or Van Loan (1975). O 


Computing f(T) via Theorem 11.1.3 would require O(2") flops. Fortu- 
nately, Parlett (1974) has derived an elegant recursive method for deter- 
mining the strictly upper triangular portion of the matrix F = f(T). It 
requires only 2n? /3 flops and can be derived from the following commutivity 
result: 

FT = TF. (11.1.3) 


Indeed, by comparing (i, j) entries in this equation, we find 


j j 
o fatu = So tafi j>i 
k=i 


k=i 


and thus, if ¢;; and tj; are distinct, 


fg = tyhi 4 


ja 
u tik fej — fiktkj 
7 tj — tis 


11.1.4 
tig — ti l ) 


k=i+1 


From this we conclude that f;; is a linear combination of its neighbors to its 
left and below in the matrix F. For example, the entry fo; depends upon 
feo, fos. fea, fos, fas, and fas. Because of this, the entire upper triangular 
portion of F can be computed one superdiagonal at a time beginning with 
the diagonal, f(t11),..., f(t£44). The complete procedure is as follows: 


Algorithm 11.1.1 This algorithm computes the matrix function F = 
f (T) where T is upper triangular with distinct eigenvalues and f is defined 
on A(T). 


for i = l:n 
fis = f(tu) 


end 
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for p= 1:n~-1 
for i= l:in — p 
j=tt+p 
s = t(f5j - fa) 
for k—i4-1:-1 
s=S+ lixfkj — fiktkj 
end 
fij = 8/ (tig — tà) 
end 
end 


This algorithm requires 2n?/3 flops. Assuming that T = QAQ# is the 
Schur form of A, f(A) = QFQP where F = f(T). Clearly, most of the 
work in computing f(A) by this approach is in the computation of the 
Schur decomposition, unless f is extremely expensive to evaluate. 


Example 11.1.2 If 


1 2 3 
T= 0 3 4 
|: 0 J 
and f(z) = (1 + z)/z then F = (fij) = f(T) is defined by 
fu = (1+1)/1=2 
fa = (1+3)/⁄3=4/3 
faa = (1+5)/5=6/5 
fiz = tia( fea — f1)/(t22 t1) = -2/3 


t23(f33 — f22)/(t33 — t22) = —4/15 
fis = [tia(f33 — fir) + (ti2f23 — f12t23)}/(t33 — t11) = —1/15. 


m 
w 
I 


11.1.4 A Block Schur Approach 


If A has close or multiple eigenvalues, then Algorithm 11.1.1 leads to poor 
results. In this case, it is advisable to use a block version of Algorithm 
11.1.1. We outline such a procedure due to Parlett (1974a). The first 
step is to choose Q in the Schur decomposition such that close or multiple 
eigenvalues are clustered in blocks 731,..., 755 along the diagonal of T. In 
particular, we must compute a partitioning 


Tir Ti2 +++ Ty Fi Fa: Pip 
0 Tz +++ Tap O Foo +++ Fa 

T = . . . ` F = . . . . 
0 0 -- T, 0 0 +: Fe 


where A(T4) N A(T5;) # 0, i # j. The actual determination of the block 
sizes can be done using the methods of 87.6. 
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Next, we compute the submatrices F;; = f(T) for 1 = lip. Since the 
eigenvalues of T;; are presumably close, these calculations require special 
methods. (Some possibilities are discussed in the next two sections.) Once 
the diagonal blocks of F are known, the blocks in the strict upper triangle 
of F can be found recursively, as in the scalar case. To derive the governing 
equations, we equate (i,j) blocks in FT = TF for i < j and obtain the 
following generalization of (11.1.4): 


ji 
FiT;-TuFg = TFj- Faut + > (Tit Fj — Fu Tk). (11.1.5) 
k=i+1 


This is a linear system whose unknowns are the elements of the block Fi; 
and whose right-hand side is “known” if we compute the F,; one block 
super-diagonal at a time. We can solve (11.1.5) using the Bartels-Stewart 
algorithm (Algorithm 7.6.2). 

The block Schur approach described here is useful when computing real 
functions of real matrices. After computing the real Schur form A = QTQ’, 
the block algorithm can be invoked in order to handle the 2-by-2 bumps 
along the diagonal of T. 


Problems 

P11.1.1 Using the definition (11.1.1) show that (a) Af(A) = f(A)A, (b) f(A) is upper 
triangular if A is upper triangular, and (c) f(A) is Hermitian if A is Hermitian. 
P11.1.2 Rewrite Algorithm 11.1.1 so that f(T) is computed column by column. 


P11.1.3 Suppose A = Xdiag(A;) X ^! where X = [z1,..., £n | and X^! 2 [gi,.... Yn ]H 
Show that if f(A) is defined, then 


f(A) = So Aeaf . 
k=1 


P11.1.4 Show that 


Tu Ti2 p | Fir Fig | p 
T= => T) = 
| | HT) o e)? 


p q p q 
where Fi, = f(T11) and Fz2 = f(T22). Assume f(T) is defined. 


Notes and References for Sec. 11.1 


The contour integral representation of f(A) given in the text is useful in functional anal- 
ysis because of its generality. See 


N. Dunford and J. Schwartz (1958). Linear Operators, Part I, Interscience, New York. 


As we discussed, other definitions of f( A) are possible. However, for the matrix functions 
typically encountered in practice, all these definitions are equivalent. See 
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R.F. Rinehart (1955). “The Equivalence of Definitions of a Matric Function,” Amer. 
Math. Monthly 62, 395—414. 


Various aspects of the Jordan representation are detailed in 


J.S. Frame (1964). “Matrix Functions and Applications, Part II,” [EEE Spectrum 1 
(April), 102-8. 

J.S. Frame (1964). “Matrix Functions and Applications, Part IV,” IEEE Spectrum 1 
(June), 123-31. 


The following are concerned with the Schur decomposition and its relationship to the 
f(A) problem: 


D. Davis (1973). “Explicit Functional Calculus," Lin. Alg. and Its Applic. 6, 193-99. 

J. Descloux (1963). *Bounds for the Spectral Norm of Functions of Matrices," Numer. 
Math. 5, 185-90. 

C.F. Van Loan (1975). *A Study of the Matrix Exponential," Numerical Analysis Report 
No. 10, Dept. of Maths., University of Manchester, England. 


Algorithm 11.1.1 and the various computational difficulties that arise when it is applied 
to a matrix having close or repeated eigenvalues are discussed in 


B.N. Parlett (1976). *A Recurrence Among the Elements of Functions of Triangular 
Matrices," Lin. Alg. and Its Applic. 14, 117-21. 


A compromise between the Jordan and Schur approaches to the f(A) problem results if 
A is reduced to block diagonal form as described in §7.6.3. See 


B. Kágstróm (1977). “Numerical Computation of Matrix Functions,” Department of 
Information Processing Report UMINF-58.77, University of Umea, Sweden. 


The sensitivity of matrix functions to perturbation is discussed in 


C.S. Kenney and A.J. Laub (1989). “Condition Estimates for Matrix Functions,” SIAM 
J. Matriz Anal. Appl. 10, 191-209. 

C.S. Kenney and A.J. Laub (1994). “Small-Sample Statistical Condition Estimates for 
General Matrix Functions,” SIAM J. Sci. Comp. 15, 36-61. 


A theme in this chapter is that if A is nonnormal, then there is more to computing f(A) 
than just computing f(z) on A(A). The pseudo-eigenvalue concept is a way of under- 
standing this phenomena. See 


L.N. Trefethen (1992). “Pseudospectra of Matrices,” in Numerical Analysis 1991, D.F. 
Griffiths and G.A. Watson (eds), Longman Scientific & Technical, Harlow, Essex, 
UK. 


More details are offered in §11.3.4. 


11.2 Approximation Methods 


We now consider a class of methods for computing matrix functions which at 
first glance do not appear to involve eigenvalues. These techniques are based 
on the idea that if g(z) approximates f(z) on (A), then f(A) approximates 
9(A), €g., 


2 q 
e^ IRA. I e D. 
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We begin by bounding || f(A) — 9(A) || using the Jordan and Schur matrix 
function representations. We follow this discussion with some comments 
on the evaluation of matrix polynomials. 


11.2.1 <A Jordan Analysis 


The Jordan representation of matrix functions (Theorem 11.1.1) can be 
used to bound the error in an approximant g( A) of f (A). 


Theorem 11.2.1 Let X^! AX = diag(J1,...,J5) be the JCF of A € €"*^ 
with 


Modo ee 0 
0 A 1 : 
J; = : 
: : : . 1 
Qoo, 


being an m;-by-m; Jordan block. If f(z) and g(z) are analytic on an open 
set containing A(A), then 


f? (X) - gi) 
Il F(A) - «(A4)lla S CO iy mL) - 909] " l. 
O0<r<mi-1 


Proof. Defining h(z) = f(z) — 9(z) we have 


I F(A) - 9(A) l2 = || Xdiag(h(Ji),...,h(Jp))X~ Ilo 


I^ 


K(X) max || h(Ji) lle - 
1<i<p 


Using Theorem 11.1.1 and equation (2.3.8) we conclude that 


[AM (A;)| 


0<r<m;-1 r! 


l| ^(Ji)lla < m; 


thereby proving the theorem. O 


11.2.2 A Schur Analysis 


If we rely on the Schur instead of the Jordan decomposition we obtain an 
alternative bound. 
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Theorem 11.2.2 Let QV AQ = T = diag(A;) + N be the Schur decompo- 
sition of A € C"*", with N being the strictly upper triangular portion of 
T. If f(z) and g(z) are analytic on a closed conver set Q whose interior 
contains A(.A), then 


n-1 NI 
1/4) - e) le. < 551 Me 
r=0 


where 


6, = sup | f(z) 9((2)] . 
zeļ 


Proof. Let h(z) = f(z) — g(z) and set H = (hij) = h(A). Let sf denote 


the set of strictly increasing integer sequences (So, . .., Sr) with the property 
that sg =i and s, = j. Notice that 


j-i 
e (r) 
S; = US 
r=1 
and so from Theorem 11.1.3, we obtain the following for all i < j: 


j-1 
hij = 5 } 5 } Ns0,81 731,82 7 71 Tos, aus A [Aes e3 Asp] - 
r=l sesi 


Now since Q is convex and h analytic, we have 


AM (z 6 
Dass] S sup petal -$ (11.2.1) 
z + ` 
Furthermore if |N|"= (n9) for r > 1, then it can be shown that 
0 j<i++r 
(r) 
^ = Zu 11.2.2 
"d » [riso ss aio; “Ns, as; J 2UTT ( ) 
sesi? 


The theorem now follows by taking absolute values in the expression for 
hij and then using (11.2.1) and (11.2.2). O 


The bounds in the above theorems suggest that there is more to approximat- 
ing f(A) than just approximating f(z) on the spectrum of A. In particular, 
we see that if the eigensystem of A is ill-conditioned and/or A’s departure 
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from normality is large, then the discrepancy between f(A) and g(A) may 
be considerably larger than the maximum of | f(z) — g(z)| on A(A). Thus, 
even though approximation methods avoid eigenvalue computations, they 
appear to be influenced by the structure of A's eigensystem, a point that 
we pursue further in the next section. 


—01 1 1 
A= 0 0 l|. 
0 0 .01 


If f(z) = e* and g(z) = 1+ z + z?/2, then || f(A) ^ 9(A) || = 1075 in either the 
Frobenius norm or the 2-norm. Since &2(X) c 107, the error predicted by Theorem 
11.2.1 is O(1), rather pessimistic. On the other hand, the error predicted by the Schur 
decomposition approach is O(107?). 


Example 11.2.1 Suppose 


11.2.83 Taylor Approximants 


A popular way of approximating a matrix function such as e4 is through 
the truncation of its Taylor series. The conditions under which a matrix 
function f(A) has a Taylor series representation are easily established. 


Theorem 11.2.3 If f(z) has a power series representation 


f(z) = > ey z^ 
k=0 


on an open disk containing (A), then 
oo 
f(A) = Mah. 
k=0 


Proof. We prove the theorem for the case when A is diagonalizable. In 
P11.2.1, we give a hint as to how to proceed without this assumption. 
Suppose X^!AX = D = diag(A,,..., A4). Using Corollary 11.1.2, we 
have 


f(A) Xdiag ( f(A1),--+;f(An)) XT 


oo oo 
Xdiag (X eA. D> sot) x 
k=0 


k=0 


oo oo oo 
X (È ant) X7 = X e(XDX-5y* = » c, A*. [m] 
k=0 k=0 


k=0 


li 
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Several important transcendental matrix functions have particularly simple 
series representations: 


oo Ak 
log(I — A) = Y l| «1, A€A(A) 
k=1 
oo A2kt+1 
sin(A) = 2C kers 
oo A2k 
en) = ODG 


The following theorem bounds the errors that arise when matrix functions 
such as these are approximated via truncated Taylor series. 


Theorem 11.2.4 If f(z) has the Taylor series 
oo 
= Yo 
k=0 


on an open disk containing the eigenvalues of A € ("*", then 


q 
Il f(A) - Soa A* ll < ax | Att? f@41)(As) |. 
k=0 


(q+ = os ei 


Proof. Define the matrix E(s) by 


f(As) = Ya As) + E(s) O<s<1. (11.2.3) 
k=0 


If fi;(s) is the (7, j) entry of f (As), then it is necessarily analytic and so 


q (k) 0 (a+1) ij 
so = (DAP) A aa 


where £j; satisfies 0 < £i; X s <1. 

By comparing powers of s in (11.2.3) and (11.2.4) we conclude that 
eij(s), the (i, j) entry of E(s), has the form 

+1 
fg (ex) a1 


eal) 7 Ga 
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Now fg D(s) is the (i, j) entry of A9+1 f(4+1)( As) and therefore 


ei(s) € max « max 

lei GS)I o<s<i1 (G+ (qt DP 0<s<1 (q+ 1)! 
The theorem now follows by applying (2.3.8). 0 
Example 11.2.2 If 


_ [ -49 24 
A= | X ul 


then 


—1.471518 1.103638 
For q — 59, Theorem 11.2.4 predicts that 


ec | —0.735759 0551819 ] 


le^ — st r la < < Git omy, || A?*1e4* |p < 10799. 
8 
k=0 


However, if u = 1077, then we find 


59 
fn Y A*|  [ —22,25880 —1.4322766 
k! | || —-6149931  —3474280 |` 
k=0 


AF) < ll At #94 D (As) llo 
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The problem is that some of the partial sums have large elements. For example, J +---+ 


A17 /17! has entries of order 107. Since the machine precision is approximately 1077, 


rounding errors larger than the norm of the solution are sustained. 


Example 11.2.2 highlights a shortcoming of truncated Taylor series approx- 
imation: It tends to be worthwhile only near the origin. The problem can 
sometimes be circumvented through a change of scale. For example, by 


repeated application of the double angle formulae: 


cos(2A) = 2cos(A)? - I sin(24) = 2sin(A) cos(A) 


it is possible to “build up" the sine and cosine of a matrix from suitably 


truncated Taylor series approximates: 


So = Taylor approximate to sin(A/2*) 
Co = Taylor approximate to cos(A/2*) 
for j = Lk 

Sj = 28j-1Cj-1 

Cj = 2C}_, — I 
end 


Here k is a positive integer chosen so that, say, || A ||oo = 2*. See Serbin 


and Blalock (1979). 
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11.2.4 Evaluating Matrix Polynomials 


Since the approximation of transcendental matrix functions so often in- 
volves the evaluation of polynomials, it is worthwhile to look at the details 
of computing 

p(A) = bol +b, A+- +b A? 


where the scalars bo,...,bg € IR are given. The most obvious approach is 
to invoke Horner’s scheme: 


Algorithm 11.2.1 Given a matrix A and b(0:q), the following algorithm 
computes F = b,A%+---+b,A + bol. 
F =b, A+ bI 
for k =q —2: — 1:0 
F = AF +b;I 
end 


This requires q — 1 matrix multiplications. However, unlike the scalar case, 
this summation process is not optimal. To see why, suppose q = 9 and 
observe that 


p(A) = A? (A? (by A? + (bg A? + b7A + bg1)) 
+(bsA? + b4A + b31)) + 024? +b1A + bol. 


Thus, F = p(A) can be evaluated with only four matrix multiplies: 


A, = A? 
As = AA 
F; = bgA3 + 0g Ao + b; A + bo T 


Fo = AgFi-cb5A42 64A 4+ b3I 
F = AgF5ctboA5 51A + bol. 


In general, if s is any integer satisfying 1 < s < y/q then 


p(A) = 3 B(A’): r = floor(q/s) (11.2.5) 
k=0 
where 
bsk+s-1457! +e + Bp A + by k=0:r-1 
Bk = 
bg 4277" + +++ + bsr41A + del k-r. 


Once A?,..., A? are computed, Horner's rule can be applied to (11.2.5) 
and the net result is that p(A) can be computed with s +r — 1 matrix 
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multiplies. By choosing s = floor(,/g), the number of matrix multiplies 
is approximately minimized. This technique is discussed in Paterson and 
Stockmeyer (1973). Van Loan (1978) shows how the procedure can be 
implemented without storage arrays for A?,..., A. 


11.2.5 Computing Powers of a Matrix 


The problem of raising a matrix to a given power deserves special mention. 
Suppose it is required to compute A13. Noting that At = (A?)?, A8 = 
(A5)? and AP? = 43414, we see that this can be accomplished with just 5 
matrix multiplications. In general we have 


Algorithm 11.2.2 (Binary Powering) Given a positive integer s and 
A € R"*", the following algorithm computes F = A® where s is a positive 
integer and A € IR?*", 


t 
Let s = > B,.2* be the binary expansion of s with f, Æ 0. 
k=0 
Z=A;q=0 
while 5, — 0 
Z-Z25q-2q-1 
end 
F-Z 
for k=q+1:t 
Z=Z 
if By #0 
F=FZ 
end 
end 


This algorithm requires at most 2 floor[log(s)] matrix multiplies. If s is a 


power of 2, then only log.(s) matrix multiplies are needed. 


11.2.6 Integrating Matrix Functions 


We conclude this section with some remarks on the integration of matrix 
functions. Suppose f(At) is defined for all t € [a,b] and that we wish to 
compute 


F= J " (Atdi. 


As in (11.1.1) the integration is on an element-by-element basis. 
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Ordinary quadrature rules can be applied to F. For example, with 
Simpson’s rule, we have 
~ he 
F ~ F = I > uxf(A(atkh)) (11.2.6) 
3 k=0 


where m is even, h = (b — a)/m and 


1 k=0,m 
Wh = 4 kodd 
2 keven,k #0,m. 


If (d*/dz*)f(zt) = f(9(zt) is continuous for t € [a,b] and if f)(At) is 
defined on this same interval, then it can be shown that F = F + E where 


nh4(b — a) 


E < 
IE < =E 


max || f(9(At) |l2. (11.2.7) 
act«b 


Let fi; and e;; denote the (i, j) entries of F and E, respectively. Under the 
above assumptions we can apply the standard error bounds for Simpson's 
rule and obtain 


h*(b — a) 


al < 
lel S —1gg 


max |e? f(?(At)ej|. 
<t<b 


The inequality (11.2.7) now follows since | E ||; X n max |e;;| and 


max jeff? (At)e| € max || f? (At) |a. 
a<t<b a<t<b 

Of course, in the practical application of (11.2.6), the function evaluations 
f(A(a + kh)) normally have to be approximated. Thus, the overall error 
involves the error in approximating f(A(a+ kh) as well as the Simpson rule 
error. 


Problems 


P11.2.1 (a) Suppose G = AI + E is a p-by-p Jordan block, where E = (5;,j 1). Show 
that 
min(p-1,k) k 
M +E) = (5 m. 
ase Y (i 
j=0 

(b) Use (a) and Theorem 11.1.1 to prove Theorem 11.2.3. 
P11.2.2 Verify (11.2.2). 
P11.2.3 Show that if || A|l2 < 1, then log(J + A) exists and satisfies the bound 
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l| log(I + A) |l2 < || A ll2/(1 — II A lle). 

P11.2.4 Let A by an n-by-n symmetric positive definite matrix. (a) Show that there 
exists a unique symmetric positive definite X such that A = X?. (b) Show that if 
Xo = I and Xy41 = (Xk + AX, *)/2 then X, — VA quadratically where V/A denotes 
the matrix X in part (a). 

P11.2.5 Specialize Algorithm 11.2.1 to the case when A is symmetric. Repeat for the 
case when A is upper triangular. In both instances, give the associated flop counts. 
P11.2.6 Show that X(t) = Ci cos(tV/A) + C2VA~! sin(tv/A) solves the initial value 
problem X(t) = - AX (t), X(0) = C1, X(0) = Cz. Assume that A is symmetric positive 
definite. 

P11.2.7 Using Theorem 11.2.4, bound the error in the approximations: 


A2k+1 1 


A?k 
sin(A) © vo a cos(A) ~ $ CU aC 


k=0 k=0 


P11.2.8 Suppose A € R*** is nonsingular and Xo € R"*" is given. The iteration 
defined by 
Xk41 = X&(2I — AX|) 


is the matrix analog of Newton's method applied to the function f(x) = a — (1/z). Use 
the SVD to analyze this iteration. Do the iterates converge to A~!? Discuss the choice 
of Xo. 


Notes and References for Sec. 11.2 


The optimality of Horner’s rule for polynomial evaluation is discussed in 


D. Knuth (1981). The Art of Computer Programming , vol. 2. Seminumerical Algo- 
rithms , 2nd ed., Addison-Wesley, Reading, Massachusetts. 

M.S. Paterson and L.J. Stockmeyer (1973). “On the Number of Nonscalar Multiplica- 
tions Necessary to Evaluate Polynomials,” SIAM J. Comp. 2, 60-66. 


The Horner evaluation of matrix polynomials is analyzed in 


C.F, Van Loan (1978). “A Note on the Evaluation of Matrix Polynomials,” IEEE Trans. 
Auto. Cont. AC-24, 320-21. 


Other aspects of matrix function computation are discussed in 


N.J. Higham and P.A. Knight (1995). “Matrix Powers in Finite Precision Arithmetic,” 
SIAM J. Matriz Anal. Appl. 16, 343-358. 

R. Mathias (1993). “Approximation of Matrix-Valued Functions,” SIAM J. Matriz Anal. 
Appl. 14, 1061-1063. 

S. Friedland (1991). “Revisiting Matrix Squaring,” Lin. Alg. and Its Applic. 154-156, 
59-63. 

H. Bolz and W. Niethammer (1988). “On the Evaluation of Matrix Functions Given by 
Power Series,” SIAM J. Matrix Anal. Appl. 9, 202-209. 


The Newton and Language representations for f(A) and their relationship to other ma- 
trix function definitions is discussed in 


R.F. Rinehart (1955). “The Equivalence of Definitions of a Matric Function,” Amer. 
Math. Monthly 62, 395-414. 
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The “double angle” method for computing the cosine of matrix is analyzed in 


S. Serbin and S. Blalock (1979). “An Algorithm for Computing the Matrix Cosine,” 
SIAM J. Sci, Stat. Comp. 1, 198-204. 


The square root is a particularly important matrix function. See 84.2.10. Several ap- 
proaches are possible: 


À. Bjórck and S. Hammarling (1983). *A Schur Method for the Square Root of a Matrix," 
Lin. Alg. and Its Applic. 52/53, 127-140. 

N.J. Higham (1986). *Newton's Method for the Matrix Square Root," Math. Comp. 
46, 531-550. 

N.J. Higham (1987). "Computing Real Square Roots of a Real Matrix," Lin. Alg. and 
Its Applic. 88/89, 405—430. 


11.3 The Matrix Exponential 


One of the most frequently computed matrix functions is the exponential 


e^t - Y (At)* 


Numerous algorithms for computing e^t have been proposed, but most of 
them are of dubious numerical quality, as is pointed out in the survey article 
by Moler and Van Loan (1978). In order to illustrate what the computa- 
tional difficulties are, we present a "scaling and squaring" method based 
upon Padé approximation. A brief analysis of the method follows that in- 
volves some e^* perturbation theory and comments about the shortcomings 
of eigenanalysis in settings where non-normality prevails. 


11.3.1 A Padé Approximation Method 


Following the discussion in §11.2, if g(z) z e7, then g(A) z e^. A very 
useful class of approximants for this purpose are the Padé functions defined 


Rpg(z) = Dpq(z)~*Noq(2), 


Z Š (p+4-k)p! 
Naa) = 21 GENG- Hi” 


and 


(p + q)!kl(q — k)! 


Notice that Rpo(z) = 14+z+---+2z?/p! is the pth order Taylor polynomial. 


Djs) = Y EE BH (ay 
k=0 
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Unfortunately, the Padé approximants are good only near the origin, as 
the following identity reveals: 


~1)9 1 a 1 > a Lu 
e^ = Ra (A) + Cr artt Dyq(A) f uP(1—u)%e40-) du. (11.3.1) 


However, this problem can be overcome by exploiting the fact that e4 = 
(e4/™)™, In particular, we can scale A by m such that Fyg= Rpg(A/m) 
is a suitably accurate approximation to e4/™. We then compute Fy, using 
Algorithm 11.2.2. If m is a power of two, then this amounts to repeated 
squaring and so is very efficient. The success of the overall procedure de- 
pends on the accuracy of the approximant 


Fa = (5): 


In Moler and Van Loan (1978) it is shown that if 


then there exists an E € IR?*" such that 


Fu = ^E 
AE = EA 

lE leo < eol A llo 

ep) = aero) _P 


(p q) (p q-- 1) 


These results form the basis of an effective e^ procedure with error control. 
Using the above formulae it is easy to establish the inequality: 


A. 
le — Fra lee A pa [loo < elp, a)l A ls, etl A loo . 
Il e^ lloc 


The parameters p and q can be determined according to some relative 
error tolerance. Note that since Fp, requires about j + max(p, q) matrix 
multiplies it makes sense to set p = q as this choice minimizes e(p, q) for a 
given amount of work. Encapsulating these ideas we obtain 


Algorithm 11.3.1 Given ô > 0 and A € IR?*", the following algorithm 
computes F = e^** where || E |loo € 8ll A lloo 


j = max(0,1 + floor(log(|| A |lco))) 
A= A/Z 
Let q be the smallest non-negative integer such that e(q,q) < 6. 
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D=1;, N=I1; X=I;c=1 
for k = 1:q 

c= c(q — k + 1)/[(2q — k + 1)k] 

X = AX; N =N +cX; D- De (-1)*eX 
end 
Solve DF = N for F using Gaussian elimination. 
for k =1:7 

F= F? 
end 


This algorithm requires about 2(q + j + 1/3)n3flops. The roundoff error 
properties of have essentially been analyzed by Ward (1977). 

The special Horner techniques of §11.2 can be applied to quicken the 
computation of D = Dgqg(A) and N = Ngq(A). For example, if q = 8 we 
have N4,(4) = U + AV and D44(A) = U — AV where 


U = col +c2A? + (c41 + cg A? + cg A3) A* 


and 
V = e +A? + (esI + c7A?)A*. 


Clearly, N and D can be found in 5 matrix multiplies rather than the 7 
required by Algorithm 11.3.1. 


11.3.2 Perturbation Theory 


Is Algorithm 11.3.1 stable in the presence of roundoff error? To answer this 
question we need to understand the sensitivity of the matrix exponential to 
perturbations in A. The starting point in the discussion is the initial value 
problem 

X(t) = AX(t) X(0) = 


where A, X(t) € IR?*", This has the unique solution X(t) = e^t, a char- 
acterization of the matrix exponential that can be used to establish the 
identity 


el(AtTE)t SAC _ [ gA(t- 9) pel(AtTE) 58g, . 
0 


From this it follows that 


| e(ATE)t _ e^t 2 lE 2 
| | x f ll eAlt-s) lla | el(AtE)s lleds . 


Further simplifications result if we bound the norms of the exponentials 
that appear in the integrand. One way of doing this is through the Schur 
decomposition. If QP AQ = diag(A;) + N is the Schur decomposition of 
A € C”*", then it can be shown that 


^t |[; 


le ~ le 


lle^* lla < ec ( AtMs (0) (11.3.2) 
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where 
a(A) = max {Re(A) : A € A(A) } (11.3.3) 
and 


n—1 k 
Nt 
Ms(t) = 3 ` | i ll . 
k=0 : 


The quantity a(A) is called the spectral abscissa and with a little manipu- 
lation it can be shown that 

| eg 4* £t _ e^t lla 2 

eI, < t|| E |a Ms(t)" exp(tMs(t)|| E |l2) . 
Notice that Ms(t) = 1 if and only if A is normal, suggesting that the matrix 
exponential problem is “well behaved” if A is normal. This observation 
is confirmed by the behavior of the matriz exponential condition number 
v(A, t), defined by 


IA lle 


(A,t) = mex aoe 
2 lle** lla 


t 
[ &?seas 
| Ellaxi liJo 


This quantity, discussed in Van Loan (1977), measures the sensitivity of 
the map A — e“ in that for a given t, there is a matrix E for which 


|| ATP — e^! | I| E lle 
ll e^* ll || A ll 


Thus, if v(A,t) is large, small changes in A can induce relatively large 
changes in e^t. Unfortunately, it is difficult to characterize precisely those 
A for which v(A,t) is large. (This is in contrast to the linear equation 
problem Az = b, where the ill-conditioned A are neatly described in terms 
of SVD.) One thing we can say, however, is that v(A,t) > t|| A lle, with 
equality holding for all non-negative t if and only if A is normal. 

Dwelling a little more on the effect of non-normality, we know from the 
analysis of §11.2 that approximating e^* involves more than just approxi- 
mating e?* on (A). Another clue that eigenvalues do not “tell the whole 
story" in the e4* problem has to do with the inability of the spectral ab- 
scissa (11.3.3) to predict the size of || e^* ||; as a function of time. If A is 
normal, then 


= v(A,t) 


|| e4* ll = ext (11.3.4) 


Thus, there is uniform decay if the eigenvalues of A are in the open left half 
plane. But if A is non-normal, then e^t can grow before decay “sets in.” 
The 2-by-2 example 


— -1 M At _ „-t 1 tM 
a=| TE e 5e lo | 


plainly illustrates this point. 
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11.3.83 Some Stability Issues 


With this discussion we are ready to begin thinking about the stability of 
Algorithm 11.3.1. A potential difficulty arises during the squaring process 
if A is a matrix whose exponential grows before it decays. If 


A j 
G = Ra (5) x e^”, 


then it can be shown that rounding errors of order 
j—1 
y = ull G? [lo] G* llall G8 la --- I G7" Ile 


can be expected to contaminate the computed G”. If || e^* ||a has a sub- 
stantial initial growth, then it may be the case that 


y > u[G? [o = ull ef lle 


thus ruling out the possibility of small relative errors. 
If A is normal, then so is the matrix G and therefore || G™ || = || G Iiz 


for all positive integers m. Thus, y ~ u|| G? ||; ~ ull e^ ||; and so the 
initial growth problems disappear. The algorithm can essentially be guar- 
anteed to produce small relative error when A is normal. On the other 
hand, it is more difficult to draw conclusions about the method when A is 
non-normal because the connection between v(A, t) and the initial growth 
phenomena is unclear. However, numerical experiments suggest that Algo- 
rithm 11.3.1 fails to produce a relatively accurate e^ only when v(A, 1) is 
correspondingly large. 


11.3.4 Eigenvalues and Pseudo-Eigenvalues 


We closed $7.1 with a comment that the eigenvalues of a matrix are gen- 
erally not good “informers” when it comes to measuring nearness to sin- 
gularity, unless the matrix is normal. It is the singular values that shed 
light on Ax = b sensitivity. Our discussion of the matrix exponential is 
another warning to the same effect. The spectrum of a non-normal A does 
not completely describe e4* behavior. 

In many applications, the eigenvalues of a matrix "say something" about 
an underlying phenomenon that is being modeled. If the eigenvalues are 
extremely sensitive to perturbation, then what they say can be misleading. 
This has prompted the development of the idea of pseudospectra. For e > 0, 
the e-pseudospectrum of a matrix A is a subset of the complex plane defined 
by 


(A) = t € €:| I — A)7! |a > 3} (11.3.5) 


Qualitatively, z is a pseudo-eigenvalue of A if zI — A is sufficiently close to 
singular. By convention we set \9(A) = A(A). Here are some pseudospectra 
properties: 
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1. If e1 € €z, then àa (A) € Ae, (A). 
2. X(A) = (ze ©: Omin(zI — A) € ej. 
3. A-(A) = {z € C: z € A(A + E), for some E with || E |2 <€}. 


Plotting the pseudospectra of a non-normal matrix A can provide insight 
into behavior. Here “behavior” can mean anything from the mathematical 
behavior of an iteration to solve Az = b to the physical behavior predicted 
by a model that involves A. See Higham and Trefethen (1993), Nachtigal, 
Reddy, and Trefethen (1992), and Trefethen, Trefethen, Reddy, and Driscoll 
(1993). 


Problems 


P11.3.1 Show that e(4* B)* — e^teBt for all t if and only if AB = BA. (Hint: Express 
both sides as a power series in t and compare the coefficient of t.) 


P11.3.2 Suppose that A is skew-symmetric. Show that both e^ and the (1,1) Padé 
approximate R1;(A) are orthogonal. Are there any other values of p and q for which 
Rpq(A) is orthogonal? 


P11.3.3 Show that if A is nonsingular, then there exists a matrix X such that A = e*. 
Is X unique? 


P11.3.4 Show that if 
-AT p _ Fu Fu n 
exp (| 0 4 2) = | 0 Fo] n 
n n 


then z 
T 
F Fiz = f e^ *PeAtat, 
0 


P11.3.5 Give an algorithm for computing e^ when A = uv’, u,v € R°”. 
P11.3.6 Suppose A € R"*" and that v € R” has unit 2-norm. Define the function 
A(t) = || e4tv 13/2 and show that 
H(t) < w(A)d(t) 
where p(A) = A1((A + AT) /2). Conclude that || e4* || < e#(4)t where t > 0. 
P11.3.7 Prove the three pseudospectra properties given in the text. 


Notes and References for Sec. 11.3 


Much of what appears in this section and an extensive bibliography may be found in the 
following survey article: 


C.B. Moler and C.F. Van Loan (1978). “Nineteen Dubious Ways to Compute the Expo- 
nential of a Matrix,” SIAM Review 20, 801-36. 


Scaling and squaring with Padé approximants (Algorithm 11.3.1) and a careful imple 
mentation of Parlett’s Schur decomposition method (Algorithm 11.1.1) were found to be 
among the less dubious of the nineteen methods scrutinized. Various aspects of Padé 
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approximation of the matrix exponential are discussed in 


W. Fair and Y. Luke (1970). “Padé Approximations to the Operator Exponential,” 
Numer. Math. 14, 379-82. 

C.F. Van Loan (1977). “On the Limitation and Application of Padé Approximation to 
the Matrix Exponential,” in Padé and Rational Approrimation, ed. E.B. Saff and 
R.S. Varga, Academic Press, New York. 

R.C. Ward (1977). “Numerical Computation of the Matrix Exponential with Accuracy 
Estimate,” SIAM J. Num. Anal. 14, 600-14. 

A. Wragg (1973). “Computation of the Exponential of a Matrix I: Theoretical Consid- 
erations,” J. Inst. Math. Applic. 11, 369-75. 

A. Wragg (1975). “Computation of the Exponential of a Matrix II: Practical Consider- 
ations,” J. Inst. Math. Applic. 15, 273-78. 


A proof of equation (11.3.1) for the scalar case appears in 


R.S. Varga (1961). “On Higher-Order Stable Implicit Methods for Solving Parabolic 
Partial Differential Equations,” J. Math. Phys. 40, 220-31. 


There are many applications in control theory calling for the computation of the ma- 
trix exponential. In the linear optimal regular problem, for example, various integrals 
involving the matrix exponential are required. See 


J. Johnson and C.L. Phillips (1971). “An Algorithm for the Computation of the Integral 
of the State Transition Matrix,” IEEE Trans. Auto. Cont. AC-16, 204-5. 

C.F. Van Loan (1978). “Computing Integrals Involving the Matrix Exponential,” IEEE 
Trans. Auto. Cont. AC-23, 395-404. 


An understanding of the map A — exp(At) and its sensitivity is helpful when assessing 
the performance of algorithms for computing the matrix exponential. Work in this di- 
rection includes 


B. Kágstróm (1977). *Bounds and Perturbation Bounds for the Matrix Exponential," 
BIT 17, 39-57. 


C.F. Van Loan (1977). "The Sensitivity of the Matrix Exponential,” SIAM J. Num. 
Anal. 14, 971-81. 


R. Mathias (1992). “Evaluating the Frechet Derivative of the Matrix Exponential,” 
Numer. Math. 63, 213-226. 


The computation of a logarithm of a matrix is an important area demanding much more 
work. These calculations arise in various “system identification” problems. See 


B. Singer and S. Spilerman (1976). “The Representation of Social Processes by Markov 
Models,” Amer. J. Sociology 82, 1-54. 
B.W. Helton (1968). “Logarithms of Matrices,” Proc. Amer. Math. Soc. 19, 733-36. 


For pointers into the pseudospectra literature we recommend 


L.N. Trefethen (1992). “Pseudospecta of Matrices,” in Numerical Analysis 1991, D.F. 
Griffiths and G.A. Watson (eds), Longman Scientific and Technical, Harlow, Essex, 
UK, 234-262. 

D.J. Higham and L.N. Trefethen (1993). “Stiffness of ODES,” BIT 33, 285-303. 

L.N. Trefethen, A.E. Trefethen, S.C. Reddy, and T.A. Driscoll (1993). “Hydrodynamic 
Stability Without Eigenvalues,” Science 261, 578—584. 


as well as Chaitin-Chatelin and Frayssé (1996, chapter 10). 


Chapter 12 
Special Topics 


812.1 Constrained Least Squares 

812.2 Subset Selection Using the SVD 
$12.3 Total Least Squares 

812.4 Computing Subspaces with the SVD 
812.5 Updating Matrix Factorizations 
812.6 Modified/Structured Eigenproblems 


In this final chapter we discuss an assortment of problems that repre- 
sent important applications of the singular value, QR, and Schur decompo- 
sitions. We first consider least squares minimization with constraints. Two 
types of constraints are considered in $12.1, quadratic inequality and linear 
equality. The next two sections are also concerned with variations on the 
standard LS problem. In §12.2 we consider how the vector of observations 
b might be approximated by some subset of A's columns, a course of action 
that is sometimes appropriate if A is rank-deficient. In §12.3 we consider 
a variation of ordinary regression known as total least squares that has 
appeal when A is contaminated with error. More applications of the SVD 
are considered in 812.4, where various subspace calculations are considered. 
In 812.5 we investigate the updating of orthogonal factorizations when the 
matrix A undergoes a low-rank perturbation. Some variations of the basic 
eigenvalue problem are discussed in 812.6. 


Before You Begin 


Because of the topical nature of this chapter, it doesn't make sense to 
have a chapter-wide, before-you-begin advisory. Instead, each section will 
begin with pointers to earlier portions of the book, and, if appropriate, 
pointers to LAPACK and other texts. 
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12.1 Constrained Least Squares 


In the least squares setting it is sometimes natural to minimize || Az — b ||, 
over a proper subset of IR". For example, we may wish to predict b as best 
we can with Az subject to the constraint that x is a unit vector. Or, perhaps 
the solution defines a fitting function f(t) which is to have prescribed values 
at a finite number of points. This can lead to an equality constrained least 
squares problem. In this section we show how these problems can be solved 
using the QR factorization and the SVD. 

Chapter 5 and 88.7 should be understood before reading this section. 
LAPACK connections include: 


LAPACK: Tools for Generalized/Constrained LS Problems 


Solves the equality constrained LS problem 
Computes the generalized QR factorization of a matrix pair 


Computes the generalized RQ factorization of a matrix pair 
Converts the GSVD problem to triangular form 
Computes the GSVD of a pair of triangular matrices 


Complementary references include Lawson and Hanson (1974) and Bjórck 
(1996). 


12.1.1 The Problem LSQI 


Least squares minimization with a quadratic inequality constraint—the 
LSQI problem—is a technique that can be used whenever the solution to 
the ordinary LS problem needs to be regularized. A simple LSQI problem 
that arises when attempting to fit a function to noisy data is 


minimize || Az —b||, ^ subject to | Bz ||, < o (12.1.1) 


where A € IR?**, b € R”, B € IR?** (nonsingular), and o > 0. The con- 
straint defines a hyperellipsoid in IR” and is usually chosen to damp out 
excessive oscillation in the fitting function. This can be done, for example, 
if B is à discretized second derivative operator. 

More generally, we have the problem 


minimize || Ax — b || subject to | Br — d ||, <a (12.1.2) 


where A € IR"*" (m > n), be R”, B € IRP*", deR, and o > 0. The 
generalized singular value decomposition of §8.7.3 sheds light on the solv- 
ability of (12.1.2). Indeed, if 


UTAX = diag(ai,... an) UTU = Im 
(12.1.3) 
VTBX 


diag(G1,..., 64) VTV = Ip, q=min{p,n} 
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is the generalized singular value decomposition of A and B, then (12.1.2) 
transforms to 

minimize | Day — 5|; ^ subject to || Dey dll < o 


where b = UT b, d = VTd, and y = X-!z. The simple form of the objective 
function 


n m 
| Day— 6113 = » (ow -by + SO (12.1.4) 
i=1 i=nt+1 
and the constraint equation 
lDsy-dld = S\(By-—d)? + M do (12.1.5) 
i=1 i=r+1 


facilitate the analysis of the LSQI problem. Here, r = rank(B) and we 
assume that 6,41 =+ = fa =0. 
To begin with, the problem has a solution if and only if 


If we have equality in this expression then consideration of (12.1.4) and 
(12.1.5) shows that the vector defined by 


dii t= dir 
Yi = bi /ai t=rtlna; #0 (12.1.6) 
0 t=rt+ln,a;j =0 


solves the LSQI problem. Otherwise 

P ~ 

Y d? < a’. (12.1.7) 
=r+1 


and we have more alternatives to pursue. The vector y € IR”, defined by 


bo; a; #0 
w= {aie ai É 


~ | di/Bi a; =0 rein 


is a minimizer of || Day — b |. If this vector is also feasible, then we have 
a solution to (12.1.2). (This is not necessarily the solution of minimum 
2-norm, however.) We therefore assume that 


q 7 2 p 
bg 72 2 
1 (3 -à) + 1 d; > a’. (12.1.8) 


i=qt+1 
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This implies that the solution to the LSQI problem occurs on the boundary 
of the feasible set. Thus, our remaining goal is to 


minimize | Day—6||, ^ subject to || Day —d||, =a. 
To solve this problem, we use the method of Lagrange multipliers. Defining 
hQsy) = || Day - 518 A (IDs - dl — 0”) 
we see that the equations 0 = Oh/Oy, , i = l:n, lead to the linear system 
(DT Da + ADLDg)y = D{b + ADI. 


Assuming that the matrix of coefficients is nonsingular, this has a solution 
y(A) where 


yi(A) = - a; + 8; 
bi/ oi i—q-ctln 


To determine the Lagrange parameter we define. 


r 2 p» 
$0) = |l DayQ) -d| = > Gaara + 5 d? 


i=1 t=r+1 


and seek a solution to ¢(A) = o7. Equations of this type are referred to as 
secular equations and we encountered them earlier in $8.5.3. From (12.1.8) 
we see that ¢(0) > o?. Now $(A) is monotone decreasing for \ > 0, and 
(12.1.8) therefore implies the existence of a unique positive À* for which 
$(A*) = o?. It is easy to show that this is the desired root. It can be 
found through the application of any standard root-finding technique, such 
as Newton's method. The solution of the original LSQI problem is then 
z = Xy(A*). 


12.1.2 LS Minimization Over a Sphere 


For the important case of minimization over a sphere (B = In, d = 0), we 
have the following procedure: 


Algorithm 12.1.1 Given A € R”*”” with m > n, b € R”, and a > 0, 
the following algorithm computes a vector x € IR” such that || Ax — b ||; is 
minimum, subject to the constraint that || x ||; € o 


Compute the SVD A = UZVT save V = [0,..., Un ], and 
form b = UTD. 
r = rank(A) 
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r b. 2 
if y^ (=) >a? 
i=1 i 


r 2 
Cibi 

Find A* h that Y^ D =a’, 

ind A* suc a 2. (=) a 


end 


The SVD is the dominant computation in this algorithm. 


Example 12.1.1 The secular equation for the problem 


f lial- [5] 


8 y? 2 X? 
— — = 1 
(2) + (34) 


For this problem we find A* = 4.57132 and x = [.93334 .35898]T . 


min 
llzl2—1 


2 
is given by 


12.1.3 Ridge Regression 


'The problem solved by Algorithm 12.1.1 is equivalent to the Lagrange mul- 
tiplier problem of determining A > 0 such that 


(ATA - Al)z = ATb (12.1.9) 


and || z ||, = o. This equation is precisely the normal equation formulation 
for the ridge regression problem 


A z b 
VAI 0 
In the general ridge regression problem one has some criteria for selecting 
the ridge parameter A, e.g., || z(A) || = o for some given a. We describe a 
A-selection procedure that is discussed in Golub, Heath, and Wahba (1979). 


Set D, = I- exer = diag(1,...,1,0,1,...,1) € IR?*"and let 2% (A) 
solve 


2 
min 
zx 


= min |Az-b|2 *AlzIZ . 
2 r 


min || D,(Azx — b) |} +All z [2 . (12.1.10) 
r 
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Thus, (A) is the solution to the ridge regression problem with the kth row 
of A and kth component of 6 deleted, i.e., the kth experiment is ignored. 
Now consider choosing A so as to minimize the cross-validation weighted 
square error C(A) defined by 


C(A) = S rw. (a px) — bk)? . 
k=1 


Here, w1,..., Wm are non-negative weights and af is the kth row of A. 
Noting that ' 


|| Azk(à) -b| = || De(Asca(A) —9) I + (az) — br)? 


we see that (aT 2,(X) — by)? is the increase in the sum of squares result- 
ing when the kth row is “reinstated.” Minimizing C(A) is tantamount to 
choosing À such that the final model is not overly dependent on any one 
experiment. 

A more rigorous analysis can make this statement precise and also sug- 
gest a method for minimizing C(A). Assuming that A > 0, an algebraic 
manipulation shows that 


aL z(A) — bk z 


z&(A) = 2(A) + 1- Fa 


(12.1.11) 


where z = (ATA +AI) lapą and z(A) = (ATA +M)-1ATb. Applying 
—af to (12.1.11) and then adding b, to each side of the resulting equation 
gives 


ef (I — A(AT A + AD)-1 AT) 


by — ag zy(A) = SAA" 
r-e) = AATA + MI) AT en 


(12.1.12) 


Noting that the residual r = (rj,...,rm)? = b — Az(A) is given by the 
formula r = [I — A(AT A + AI)! AT]b, we see that 


m 


CQ) = — ur (acta) 


k=1 


The quotient r;,/(Or,/Ob,) may be regarded as an inverse measure of the 
“impact” of the kth observation by on the model. When Or; /Ob, is small, 
this says that the error in the model’s prediction of b, is somewhat inde- 
pendent of by. The tendency for this to be true is lessened by basing the 
model on the A* that minimizes C(A). 

The actual determination of X* is simplified by computing the SVD of 
A. Indeed, if UT AV = diag(oi,...,0n) with o1 >... > 04, and b =U", 
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then it can be shown from (12.1.12) that 


The minimization of this expression is discussed in Golub, Heath, and 
Wahba (1979). 


12.1.4 Equality Constrained Least Squares 


We conclude the section by considering the least squares problem with 
linear equality constraints: 


min || Az — b ||; (12.1.13) 
Ba=d 


Here A € IR"*", B € IP*", b c R™, d € IRP, and rank(B) = p. We refer 
to (12.1.13) as the LSE problem. By setting a = 0 in (12.1.2) we see 
that the LSE problem is a special case of the LSQI problem. However, 
it is simpler to approach the LSE problem directly rather than through 
Lagrange multipliers. 

Assume for clarity that both A and B have full rank. Let 


_ |È p 
QTBT = Hine 


be the QR factorization of BT and set 


AQ =[Ai A] QTr = H P 
p n-p 


It is clear that with these transformations (12.1.13) becomes 


min || Aiy + 422 — blo. 
RT y=d 


Thus, y is determined from the constraint equation RT y = d and the vector 
z is obtained by solving the unconstrained LS problem 


min ||.422z — (b — Ay) |l;. 
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Combining the above, we see that x = Q | 2 | solves (12.1.13). 


Algorithm 12.1.2 Suppose A € R”*”, B e IP*", be IR", and d eR. 
If rank(A) = n and rank(B) = p, then the following algorithm minimizes 
|| Ax — b || subject to the constraint Br = d . 


BT-QR (QR factorization) 

Solve R(1:p, 1:p)T y =d for y. 

A= AQ 

Find z so || A(:,p + 1:n)z — (b — A(:, 1:p)y) |l; is minimized. 

T= Q(., 1:p)y + Q(:,P + L:n)z 
Note that this approach to the LSE problem involves two factorizations and 
a matrix multiplication. 


12.1.5 The Method of Weighting 


An interesting way to obtain an approximate solution to (12.1.13) is to 
solve the unconstrained LS problem 


[se |e- [a 


for large A. The generalized singular value decomposition of §8.7.3 sheds 
light on the quality of the approximation. Let 


min 
zx 


(12.1.14) 


2 


UTAX = diag(o1,...,04) = Da € R™*” 


VTBX = diag(f,,..., Bp») = Dp € RP*" 
be the GSVD of (A, B) and assume that both matrices have full rank for 
clarity. If U = [uj,...,u4 ], V = [v,..., vo ] and X =[2,...,2,], then 
it is easy to show that 
POT n „T 
_ vd ui b 
r= 3 DEM 22 aU (12.1.15) 


is the exact solution to (12.1.13), while 


P T 292,T B. ul 
ou; b+ A^Bzv; d ui b 
z(A) = D ag iTr; + 2, we (12.1.16) 
i=1 i=p+1 
solves (12.1.14). Since 
P as( Bulb — awT 
z()-z- y on = oun d) (12.1.17) 


B(oi-XP C 


i=1 


12.1. CONSTRAINED LEAST SQUARES 587 


it follows that z(A) — x as A — oo. 

The appeal of this approach to the LSE problem is that no special sub- 
routines are required: an ordinary LS solver will do. However, for large 
values of A numerical problems can arise and it is necessary to take precau- 
tions. See Powell and Reid (1968) and Van Loan (1982a). 


Example 12.1.2 The problem 


1 2 7 
min 3 4 [ zi | -|1 
zj]-—cr3 5 6 x2 3 


has solution z = [.3407821 , .3407821]7. This can be approximated by solving 


2 


1 2 7 

min 3 4 [ Z1 | _ 1 
5 6 22 3 

1000 —1000 0 


2 
which has solution z = [.3407810 , .3407829]". 


Problems 


P 12.1.1 (a) Show that if null(A) N null(B) Æ (0), then (12.1.2) cannot have a unique 
solution. (b) Give an example which shows that the converse is not true. (Hint: A*b 
feasible.) 


P12.1.2 Let po(x),...,pn(z) be given polynomials and (zo, yo), ..., (zm, ym) a given 
set of coordinate pairs with z; € [a,b]. It is desired to find a polynomial p(z) = 
eco akpx(z) such that eo (PC (zi) — yi)? is minimized subject to the constraint that 


[ p" z)]^dz x PECORE o? 


t=0 


where z; = a +ih and b =a + Nh. Show that this leads to an LSQI problem of the form 
(12.1.1). 


P12.1.8 Suppose Y = [yi,..., yx] € R™** has the property that 
YTY = diag(d),..,d2) dı >d2>-+->d, >O. 


Show that if Y = QR is the QR factorization of Y, then R is diagonal with |rj;| = di. 


P12.1.4 (a) Show that if (AT A + AI)z = ATb, X > 0, and ||z ||, = o, then z = 
(Az — b)/X solves the dual equations (AAT -- AI)z = —b with || ATz "za = =a. (b) Show 
that if (AAT + AZ)z = —b, || AT z ||, = æ, then z = —AT z satisfies (AT A -AI)z = ATb, 
lz lla =a. 

P12.1.5 Suppose A is the m-by-1 matrix of ones and let b € R™. Show that the 
cross-validation technique with unit weights prescribes an optimal à given by 


where bT = (by +--+ +6m)/m and s = y» — b)? /(m ~ 1). 


i=1 
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P12.1.6 Establish equations (12.1.15), (12.1.16), and (12.1.17). 
P12.1.7 Develop an SVD version of Algorithm 12.1.2 that can handle rank deficiency 


in A and B. 
— [| A 
A= [a] 


P12.1.8 Suppose 
where Ai € R” X” is nonsingular and Az € R("-"**. Show that 


Omin(A) > Y1+omin(A2Az')? omin(A1) - 


P12.1.9 Consider the problem 


min  |Az-b|,  A€R"*",beR", B,C ER" 
2 


Assume that B and C are positive definite and that Z € R”*” is a nonsingular matrix 
with the property that ZT BZ = diag(Ai,...,An) and ZTCZ = In. Assume that 
A 22 Àn. (a) Show that the the set of feasible x is empty unless Àn < BIP « Xx. 
(b) Using Z, show how the two constraint problem can be converted to a single constraint 
problem of the form 


min || Ax — è ll; 
yl Wy=67-dAny* 
where W = diag(A1,...,An) — AnI. 


P12.1.10 Suppose p > m > n and that A € R™*" and Be R™*? Show how to 
compute orthogonal Q € R™*™ and orthogonal V € R”*” so that 


qa-|$|  oTav=(0,5) 


where R € R*** and S € R™*™ are upper triangular. 
P12.1.11 Suppose r € R”, y € R”, and 6 > 0. Show how to solve the problem 


min l| Ev -r ll. 
E E Rmxn 
l| E lle s5 


Repeat with “min” replaced by “max”. 


Notes and References for Sec. 12.1 


Roughly speaking, regularization is a technique for transforming a poorly conditioned 
problem into a stable one. Quadratically constrained least squares is an important ex- 
ample. See 


L. Eldén (1977). “Algorithms for the Regularization of Ill-Conditioned Least Squares 
Problems,” BIT 17, 134-45. 


References for cross-validation include 


G.H. Golub, M. Heath, and G. Wahba (1979). “Generalized Cross-Validation as a 
Method for Choosing a Good Ridge Parameter," Technometrics 21, 215-23. 

L. Eldén (1985). “A Note on the Computation of the Generalized Cross-Validation 
Function for Ill-Conditioned Least Squares Problems," BIT 24, 467-472. 
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G.E. Forsythe and G.H. Golub (1965). *On the Stationary Values of a Second-Degree 
Polynomial on the Unit Sphere," SIAM J. App. Math. 14, 1050-68. 

L. Eldén (1980). “Perturbation Theory for the Least Squares Problem with Linear 
Equality Constraints,” SIAM J. Num. Anal. 17, 338-50. 

W. Gander (1981). “Least Squares with a Quadratic Constraint,” Numer. Math. 36, 
291—307. 

L. Eldén (1983). “A Weighted Pseudoinverse, Generalized Singular Values, and Con- 
strained Least Squares Problems," BIT 22 , 487-502. 

G.W. Stewart (1984). *On the Asymptotic Behavior of Scaled Singular Value and QR 
Decompositions,” Math. Comp. 43, 483-490. 

G.H. Golub and U. von Matt (1991). “Quadratically Constrained Least Squares and 
Quadratic Problems," Numer. Math. 59, 561—580. 

T.F. Chan, J.A. Olkin, and D. Cooley (1992). “Solving Quadratically Constrained Least 
Squares Using Black Box Solvers," BIT 32, 481-495. 


Other computational aspects of the LSQI problem involve updating and the handling of 
banded and sparse problems. See 


K. Schittkowski and J. Stoer (1979). *A Factorization Method for the Solution of Con- 
strained Linear Least Squares Problems Allowing for Subsequent Data changes,” 
Numer. Math. 31, 431—463. 

D.P. O’Leary and J.A. Simmons (1981). “A Bidiagonalization-Regularization Procedure 
for Large Scale Discretizations of Ill-Posed Problems,” SIAM J. Sci. and Stat. Comp. 
2, 474-489. 

À. Bjórck (1984). *A General Updating Algorithm for Constrained Linear Least Squares 
Problems,” SIAM J. Sci. and Stat. Comp. 5, 394-402. 

L. Eldén (1984). *An Algorithm for the Regularization of Ill-Conditioned, Banded Least 
Squares Problems," SIAM J. Sci. and Stat. Comp. 5, 237-254. 


Various aspects of the LSE problem are discussed and analyzed in 


M.J.D. Powell and J.K. Reid (1968). “On Applying Householder’s Method to Linear 
Least Squares Problems,” Proc. IFIP Congress, pp. 122-26. 

C. Van Loan (1985). *On the Method of Weighting for Equality Constrained Least 
Squares Problems," SIAM J. Numer. Anal. 22, 851—864. 

J.L. Barlow, N.K. Nichols, and R.J. Plemmons (1988). "Iterative Methods for Equality 
Constrained Least Squares Problems," SIAM J. Sci. and Stat. Comp. 9, 892-906. 
J.L. Barlow (1988). *Error Analysis and Implementation Aspects of Deferred Correction 
for Equality Constrained Least-Squares Problems," SIAM J. Num. Anal. 25, 1340- 

1358. 

J.L. Barlow and S.L. Handy (1988). "The Direct Solution of Weighted and Equality 
Constrained Least-Squares Problems,” SIAM J. Sci. Stat. Comp. 9, 704-716. 

J.L. Barlow and U.B. Vemulapati (1992). *A Note on Deferred Correction for Equality 
Constrained Least Squares Problems," SIAM J. Num. Anal. 29, 249-256. 

M. Wei (1992). *Perturbation Theory for the Rank-Deficient Equality Constrained Least 
Squares Problem," SIAM J. Num. Anal. 29, 1462-1481. 

M. Wei (1992). “Algebraic Properties of the Rank-Deficient Equality-Constrained and 
Weighted Least Squares Problems," Lin. Alg. and Its Applic. 161, 27-44. 

M. Gulliksson and P-À. Wedin (1992). “Modifying the QR-Decomposition to Con- 
strained and Weighted Linear Least Squares,” SIAM J. Matriz Anal. Appl 13, 
1298-1313. 
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M. Gulliksson (1995). “Backward Error Analysis for the Constrained and Weighted 
Linear Least Squares Problem When Using the Weighted QR Factorization,” SIAM 
J. Matriz. Anal. Appl. 13, 675-687. 


Generalized factorizations have an important bearing on generalized least squares prob- 
lems. 


C.C. Paige (1985). “The General Linear Model and the Generalized Singular Value 
Decomposition,” Lin. Alg. and Its Applic. 70, 269-284. 

C.C. Paige (1990). “Some Aspects of Generalized QR Factorization,” in Reliable Nu- 
merical Computations, M. Cox and S. Hammarling (eds), Clarendon Press, Oxford. 

E. Anderson, Z. Bai, and J. Dongarra (1992). “Generalized QR Factorization and Its 
Applications,” Lin. Alg. and Its Applic. 162/163/164, 243-271. 


12.2 Subset Selection Using the SVD 


As described in 85.5, the rank-deficient LS problem min || Az — b || can be 
approached by approximating the minimum norm solution 


T 
TLS = 5 At s r = rank( A) 


with 


where 


A = UXVT = Mou; (12.2.1) 
i=1 


is the SVD of A and f is some numerically determined estimate of r. Note 
that rz; minimizes || Azz — b ||, where 


Li 
Az = J ciujvT 
i=1 


is the closest matrix to A that has rank f. See Theorem 2.5.3. 

Replacing A by A; in the LS problem amounts to filtering the small 
singular values and can make a great deal of sense in those situations where 
A is derived from noisy data. In other applications, however, rank deficiency 
implies redundancy among the factors that comprise the underlying model. 
In this case, the model-builder may not be interested in a predictor such 
as A;2; that involves all n redundant factors. Instead, a predictor Ay may 
be sought where y has at most 7 nonzero components. The position of the 
nonzero entries determines which columns of A, i.e., which factors in the 
model, are to be used in approximating the observation vector b. How to 
pick these columns is the problem of subset selection and is the subject of 
this section. 

The contents of this section depends heavily upon 82.6 and Chapter 5. 
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12.2.1 QR with Column Pivoting 


QR with column pivoting can be regarded as a method for selecting an 
independent subset of A's columns from which b might be predicted. Sup- 
pose we apply Algorithm 5.4.1 to A € IR?*" and compute an orthogonal 
Q and a permutation II such that R = QT AII is upper triangular. If 
R(1:7, 1:F)z = b(1:F) where b = QTb and we set 


then Ay is an approximate LS predictor of b that involves the first ? columns 
of AII. 


12.2.2 Using the SVD 


Although QR with column pivoting is a fairly reliable way to handle near 
rank deficiency, the SVD is sometimes preferable for reasons discussed in 
85.5. We therefore describe an SVD-based subset selection procedure due 
to Golub, Klema, and Stewart (1976) that proceeds as follows: 


e Compute the SVD A = UXVT and use it to determine a rank estimate 
T. 


e Calculate a permutation matrix P such that the columns of the matrix 
B; € IR"*' in AP = [Bi B5] are "sufficiently independent." 


e Predict b with the vector Ay where y = P | o | and z € IR’ minimizes 
|| Biz — b |l2 


The second step is key. Since 
min || Biz—bll2 = | Ay-b|la 2 min ||Az—b|lz 
IR” z eR” 
it can be argued that the permutation P should be chosen to make the 


residual (I — B1 Bj )b as small as possible. Unfortunately, such a solution 
procedure can be unstable. For example, if 


1 1 0 1 
A-2|11-«e1]|, 5-21|-1|, 
0 0 1 0 


f = 2, and P = I, then min || Biz — b ||; = 0, but || Bb || = O(1/e). 
On the other hand, any proper subset involving the third column of A is 
strongly independent but renders a much worse residual. 


592 CHAPTER 12. SPECIAL TOPIcs 


This example shows that there can be a trade-off between the indepen- 
dence of the chosen columns and the norm of the residual that they render. 
How to proceed in the face of this trade-off requires additional mathemati- 
cal machinery in the form of useful bounds on o7(B,), the smallest singular 
value of By. 


Theorem 12.2.1 Let the SVD of A € IR™*” be given by (12.2.1), and 
define the matriz B, € R™*", F < rank(A), by 


AP-[B| B] 
T n-f 


where P € IR°*” is a permutation. If 
n-f (12.2.2) 


and V, is nonsingular, then 


o;(A) 
—=— < ez(B1) < o;(A) . 
Il Vi lle " " 
Proof. The upper bound follows from the minimax characterization of 
singular values given in §8.6.1 
To establish the lower bound, partition the diagonal matrix of singular 


values as follows: 
_ DB 0 f 
w= | 0 X» | m-f 


r n-r 


If w € Ff is a unit vector with the property that || B,w ||; = o7(B,), then 
2 


(4n. 2 
9; (B1) 0 


|X:VEw |Z + | XoViw li? - 
The theorem now follows because || X1 Vw ||; > e2(A)/l| Viz? |l. © 


| Bwl? = |vev7e| w | 


2 


This result suggests that in the interest of obtaining a sufficiently indepen- 
dent subset of columns, we choose the permutation P such that the result- 
ing Vi; submatrix is as well-conditioned as possible. A heuristic solution to 
this problem can be obtained by computing the QR with column-pivoting 
factorization of the matrix | VẸ VŽ ], where 
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is a partitioning of the matrix V in (12.2.1). In particular, if we apply QR 
with column pivoting (Algorithm 5.4.1) to compute 


QT Vii VA |P = [Ru Ri? | 


T mn-r 


where Q is orthogonal, P is a permutation matrix, and Hj; is upper trian- 
gular, then (12.2.2) implies: 


Vu = pt Vul. RYQ? | 
Vi Va RjQT 
Note that F;; is nonsingular and that || VIt ||, = || R11 ||. Heuristically, 


column pivoting tends to produce a well-conditioned R11, and so the overall 
process tends to produce a well-conditoned Vi;. Thus we obtain 


Algorithm 12.2.1 Given A € IR"*^ and b € IR" the following algo- 
rithm computes a permutation P, a rank estimate 7, and a vector z € IR? 
such that the first 7? columns of B = AP are independent and such that 
|| BC: 1:7)z — b ||; is minimized. 


Compute the SVD UTAV = diag(o,,...,0,) and save V. 

Determine f < rank(A). 

Apply QR with column pivoting: Q7V(:,1:7)7 P = | R1; R1? ] and set 
AP = [B, B5] with B, € R™** and B; € R™O—), 

Determine z € IR? such that || b — Biz ||, = min. 


Example 12.2.1 Let 


3 4 1.0001 
7 4 —3.0002 
2 5 2.9999 
-1 4 5.0003 


1 
1 
A= ; b= D 
1 
A is close to being rank 2 in the sense that 03(A) œ% .0001. Setting # = 2 in Algorithm 
12.2.1 leads to z = [0 0.2360 — 0.0085] with || Az — b ||, = .1966. (The permutation 


P is given by P = [es e2 ei|) Note that zrs = [828.1056 — 827.8569 828.0536]7 
with minimum residual || Az zs — b || = 0.0343. 


12.2.3 More on Column Independence vs. Residual 


We return to the discussion of the trade-off between column independence 
and norm of the residual. In particular, to assess the above method of 
subset selection, we need to examine the residual of the vector y that it 
produces r, = b— Ay = b— Biz = (I — B, By )b. Here, Bj = B(:,1:7) with 
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B = AP. To this end, it is appropriate to compare ry with rz, = b— Az; 
since we are regarding A as a rank-7 matrix and since r; solves the nearest 
rank-7 LS problem, namely, min || Azz — b ||,. 


Theorem 12.2.2 Ifr, and r,, are defined as above and if Ya is the leading 
r-by-r principal submatriz of PTV, then 


o741(A), 
rs -ryle s SEED ES lald le 
T 


Proof. Note that rz, = (I — UU? )b and ry = (I — Q1QT)b where 
U=[U, Up | 


T m-f 


is a partitioning of the matrix U in (12.2.1) and where Q1 = B,(B? B1)- V?. 
Using Theorem 2.6.1 we obtain 


| rz. — ry lle € UP - Q1QT [lp llb lle = UFO: lle ll Olle 


while Theorem 12.2.1 permits us to conclude that 


1 
|UZQilo < |UZB lall (BEB)? lle € eca(4)— 55 


~ = o;(B}) 
< Suv qo 
Noting that 
lr -ry la = |Ew- Y fou: 
i=1 2 


we see that Theorem 12.2.2 sheds light on how well B,y can predict the 
“stable” component of b, i.e., UT b. Any attempt to approximate UTb 
can lead to a large norm solution. Moreover, the theorem says that if 
o741(A) < o;(A), then any reasonably independent subset of columns 
produces essentially the same-sized residual. On the other hand, if there 
is no well-defined gap in the singular values, then the determination of f 
becomes difficult and the entire subset selection problem more complicated. 


Problems 
P12.2.1 Suppose A € R™*” and that || uT A||, = o with uTu = 1. Show that if 
uT (Az — b) = 0 for z € R” and b € R”, then | z ll; > |uTt|/c. 


P12.2.2 Show that if Bı € R™** is comprised of k columns from A € R?*" then 
ex (B1) € o%(A). 


P12.2.3 In equation (12.2.2) we know that the matrix 


pTy = [ Ya | a 
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is orthogonal. Thus, || V! ll, = IlI Ü ||, from the CS decomposition (Theorem 2.6.3). 
Show how to compute P by applying the QR with column pivoting algorithm to (61 Vb . 
(For f > n/2, this procedure would be more economical than the technique discussed in 
the text.) Incorporate this observation in Algorithm 12.2.1. 


Notes and References for Sec. 12.2 


The material in this section is derived from 


G.H. Golub, V. Klema and G.W. Stewart (1976). “Rank Degeneracy and Least Squares 
Problems,” Technical Report TR-456, Department of Computer Science, University 
of Maryland, College Park, MD. 


A subset selection procedure based upon the total least squares fitting technique of $12.3 
is given in 
S. Van Huffel and J. Vandewalle (1987). “Subset Selection Using the Total Least Squares 


Approach in Collinearity Problems with Errors in the Variables," Lin. Alg. and Its 
Applic. 88/89, 695—714. 


The literature on subset selection is vast and we refer the reader to 


H. Hotelling (1957). "The Relations of the Newer Multivariate Statistical Methods to 
Factor Analysis,” Brit. J. Stat. Psych. 10, 69—79. 


12.3 Total Least Squares 


The problem of minimizing || D(Az — b) |; where A € IR?*^, and D = 


diag(d,,...,dm) is nonsingular can be recast as follows: 
min || Drl; | rem". (12.3.1) 
b+r € range(A) 


In this problem, there is a tacit assumption that the errors are confined to 
the “observation” b. When error is also present in the “data” A, then it 
may be more natural to consider the problem 


min || DIE, r|T ||, EeR"*", r eR” (12.3.2) 
b+r € range(A+ E) 


where D = diag(di,...,dm) and T = diag(t),...,tn41) are nonsingular. 
This problem, discussed in Golub and Van Loan (1980), is referred to as 
the total least squares (TLS) problem. 

If a minimizing | Eo, ro] can be found for (12.3.2), then any z satisfying 
(A+ Eg)z = b t ro is called a TLS solution. However, it should be realized 
that (12.3.2) may fail to have a solution altogether. For example, if 


1 0 1 0 0 
A-|001l,5-2|1 , D = I}, T = I}, and Ee = 0 e€ 
0 0 1 0 e€ 
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then for all e > 0, b € ran(A + Ec). However, there is no smallest value of 
| [Z, 7] || for which b +r € ran(A + E). 

A generalization of (12.3.2) results if we allow multiple right-hand sides. 
In particular, if B € IR?**, then we have the problem 


min | DLE, RIT |lp (12.3.3) 
range( B+R) C range( A-- E) 


where E € IR?*" and R € IR?** and the matrices D = diag(d,,...,dm) 
and T = diag(ti,...,t,,X) are nonsingular. If [ Eo, Ro ] solves (12.3.3), 
then any X € IR?** that satisfies (A + Eo)X = (B+ Hg) is said to be a 
TLS solution to (12.3.3). 

In this section we discuss some of the mathematical properties of the 
total least squares problem and show how it can be solved using the SVD. 
Chapter 5 is the only prerequisite. À very detailed treatment of the TLS 
problem is given in the monograph by Van Huffel and Vanderwalle (1991). 


12.3.4 Mathematical Background 


The following theorem gives conditions for the uniqueness and existence of 
a TLS solution to the multiple right-hand side problem. 


Theorem 12.3.1 Let A, B, D, andT be as above and assume m > n +k. 
Let 
C= DA, BJT - (Ci C] 
n k 


have SVD UT CV = diag(o,...,0n4k) = X where U, V, and È are parti- 
tioned as follows: 


VW V 
U -[U U} V= EK |G 
n k 2 E 
_ »» 0 n 
X= E s]: 
n k 


If on(Ci) > on4i(C), then the matriz | Eo, Ro] defined by 
D| Eo, Ro]|T = —U2d2[Vi5, VS] (12.3.4) 


solves (12.3.3). If T, = diag(ti,...,tn) and Tz = diag(tn+1,- -< tax) then 
the matriz 


Xrris = ~T V2Vz T4! 
exists and is the unique solution to (A + Eo)X = B + Ro. 
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Proof. We first establish two results that follow from the assumption 
On(C1) > On4i(C). From the equation CV = UX we have Ci Vi2+C2Vz2 = 
U2X2. We wish to show that V22 is nonsingular. Suppose Vz2x = 0 for some 
unit 2-norm z. It follows from Vi5Vi2 + V5Va? = I that || Vizz ||; = 1. But 
then 

On41(C) 2 ll U5X5r p = | C,Viax lla 2 e4(C1) , 
a contradiction. Thus, the submatix V9 is nonsingular. 

The other fact that follows from a4(C1) > On41(C) concerns the strict 
separation of o,(C) and on41(C). From Corollary 8.3.3, we have on(C) > 
On(C1) and so on(C) > on(Ci) > Ongi(C) . 

Now we are set to prove the theorem. If ran(B + R) C ran(A+ E), 
then there is an X (n-by-k) so (A + E)X = B + R, i.e., 

(DIA, B]T & DLE, R]T T2 | 7, | = 0. (12.3.5) 
Thus, the matrix in curly brackets has, at most, rank n. By following the 
argument in Theorem 2.5.3, it can be shown that 

n+k 
| DLE, RIP ip > Y. eC? 
i=n+1 

and that the lower bound is realized by setting [ E, R] = [ Eo, Ro]. The 
inequality on(C) > on41(C) ensures that [Eo , Ro] is the unique minimizer. 
The null space of 


(D[A, B]T + D[ Eo, Ro]T) = UX [Vi Và] 


is the range of | v | . Thus, from (12.3.5) 
22 


X V 
T^! — 12 S 
ES 
for some k-by-k matrix S. From the equations Ti ! X = Vi28 and -T 1> 
V25S we see that S = -Vz T3 ! , Thus, we must have 


X = T1Vi2S = -T V2 Vz T; } = XTLS: m 


If on(C) = on4i(C), then the TLS problem may still have a solution, 
although it may not be unique. In this case, it may be desirable to single 
out a “minimal norm” solution. To this end, consider the t-norm defined 
on IR?** by || Z ||, = || Ty ZT: ||. If X is given by (12.3.5), then from the 
CS decomposition (Theorem 2.6.3) we have 


1- gx (V3)? 
ex(Vo2? ` 
This suggests choosing V in Theorem 12.3.1 so that c4 (V2?) is maximized. 


2 —- 
IX lle = IVV% I. = 
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12.3.2 Computations for the k—1 Case 


We show how to maximize V2 in the important k = 1 case. Suppose 


the singular values of C satisfy 904-5 > On-p41 = Uc = Ony and let 
V =[v1,..-,Un41] be a column partitioning of V. If Q is a Householder 
matrix such that 
x W z n 
VG,n+1l—pn+1)Q = | 0 o | 1 
p 1 


then | > | has the largest (n + 1)-st component of all the vectors in 


span(un41-p;...,Un41) . Ifa = 0, the TLS problem has no solution. Oth- 
erwise Trs = —Tiz/(t4,410). Moreover, 


[v o ora uv | ^ | Lx 


and so 


D| Eo, ro]T = -p[4,0}r| > | [27 a]. 
Overall, we have the following algorithm: 


Algorithm 12.3.1 Given A € IR?*" (m > n), b € IR", and nonsingular 
D = diag(d,,...,dm) and T = diag(ti,...,t441), the following algorithm 
computes (if possible) a vector zrs € R” such that (A+ Eg)z = (6+ ro) 
and || D[ Eo, ro JT || is minimal. 

Compute the SVD UT(D[ A, b|T)V = diag(o1,...,0541). Save V. 

Determine p such that 9; > ++: > 04-5 > On—-p+1 = 77 = 8n4a- 

Compute a Householder matrix P such that if V = V P, then 

V(n -1,n—p-41:n) 20 


if Ont1n+1 # 0 
for i = l:n 
Li = titi ngi/(tn410n41,n41) 
end 
end 


This algorithm requires about 2mn? + 12n? flops and most of these are 
associated with the SVD computation. 


Example 12.3.1 The TLS problem min ll [es r) le where a = [1, 2, 3, 4]7 and 
(at+e)z=b+r 


b = [2.01, 3.99, 5.80, 8.30]? has solution zrrs = 2.0212, e = [-.0045, —.0209, —.1048, .085 
and r = [.0022, .0103, .0519, —.0423]7. Note that for this data ry 5 = 2.0197. 
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12.3.3 Geometric Interpretation 


It can be shown that the TLS solution x7,5 minimizes 


m 
la? z — bil? 
(z) 2 ‘ITTI r + tata 


where af is ith row of A and b; is the ith component of b. A geometrical 
interpretation of the TLS problem is made possible by this observation. 
Indeed, 

la? x — bi)? 


Tp-2 -2 
LTT "rci 


is the square of the distance from | | € IR"*! to the nearest point in 
1 


the subspace 
P, = (s ice R^ bem, b- za] 


where distance in IR?*! is measured by the norm || z || = || Tz ||,- A great 
deal has been written about this kind of fitting. See Pearson (1901) and 
Madansky (1959). 


Problems 


P12.3.1 Consider the TLS problem (12.3.2) with nonsingular D and T. (a) Show that 
if rank(A) < n, then (12.3.2) has a solution if and only if b € ran( A). (b) Show that if 
rank(A) = n, then (12.3.2) has no solution if AT D?b = 0 and [tn41||| Do ||; = on(DATi) 
where T3 = diag(ti,...,t«). 


P12.3.2 Show that if C = D| A, b]T =[ A1, d] and on(C) > on41(C), then the TLS 
solution z satisfies (AT A1 — on41(C)7J)a = ATd. 


P12.3.3 Show how to solve (12.3.2) with the added constraint that the first p columns 
of the minimizing E are zero. 


Notes and References for Sec. 12.3 


This section is based upon 


G.H. Golub and C.F. Van Loan (1980). *An Analysis of the Total Least Squares Prob- 
lem,” SIAM J. Num. Anal. 17, 883-93. 


The bearing of the SVD on the TLS problem is set forth in 


G.H. Golub and C. Reinsch (1970). “Singular Value Decomposition and Least Squares 
Solutions,” Numer. Math. 14, 403-420. 
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G.H. Golub (1973). “Some Modified Matrix Eigenvalue Problems,” SIAM Review 15, 
318—334. 


The most detailed study of the TLS problem is 


S. Van Huffel and J. Vandewalle (1991). The Total Least Squares Problem: Computa- 
tional Aspects and Analysis, SIAM Publications, Philadelphia. 


If some of the columns of A are known exactly then it is sensible to force the TLS per- 
turbation matrix E to be zero in the same columns. Aspects of this constrained TLS 
problem are discussed in 


J.W. Demmel (1987). "The Smallest Perturbation of a Submatrix which Lowers the 
Rank and Constrained Total Least Squares Problems, SIAM J. Numer. Anal. 24, 
199—206. 

S. Van Huffel and J. Vandewalle (1988). "The Partial Total Least Squares Algorithm," 
J. Comp. and App. Math. 21, 333-342. 

S. Van Huffel and J. Vandewalle (1988). “Analysis and Solution of the Nongeneric Total 
Least Squares Problem," SIAM J. Matriz Anal. Appl. 9, 360-372. 

S. Van Huffel and J. Vandewalle (1989). “Analysis and Properties of the Generalized 
Total Least Squares Problem AX z B When Some or All Columns in A are Subject 
to Error,” SIAM J. Matriz Anal. Appl. 10, 294—315. 

S. Van Huffel and H. Zha (1991). "The Restricted Total Least Squares Problem: For- 
mulation, Algorithm, and Properties," SIAM J. Matriz Anal. Appl. 12, 292-309. 

S. Van Huffel (1992). “On the Significance of Nongeneric Total Least Squares Problems,” 
SIAM J. Matriz Anal. Appl. 13, 20-35. 

M. Wei (1992). “The Analysis for the Total Least Squares Problem with More than One 
Solution,” SIAM J. Matriz Anal. Appl. 13, 746—763. 

S. Van Huffel and H. Zha (1993). “An Efficient Total Least Squares Algorithm Based 
On a Rank-Revealing Two-Sided Orthogonal Decomposition,” Numerical Algorithms 
4, 101-133. 

C.C. Paige and M. Wei (1993). “Analysis of the Generalized Total Least Squares Problem 
AX = B when Some of the Columns are Free of Error,” Numer. Math. 65, 177—202. 

R.D. Fierro and J.R. Bunch (1994). “Collinearity and Total Least Squares,” SIAM J. 
Matriz Anal. Appl. 15, 1167-1181. 


Other references concerned with least squares fitting when there are errors in the data 
matrix include 


K. Pearson (1901). “On Lines and Planes of Closest Fit to Points in Space," Phil. Mag. 
2, 559-72. 

A. Wald (1940). “The Fitting of Straight Lines if Both Variables are Subject to Error,” 
Annals of Mathematical Statistics 11, 284—300. 

A. Madansky (1959). “The Fitting of Straight Lines When Both Variables Are Subject 
to Error," J. Amer. Stat. Assoc. 54, 173-205. 

I. Linnik (1961). Method of Least Squares and Principles of the Theory of Observations, 
Pergamon Press, New York. 

W.G. Cochrane (1968). “Errors of Measurement in Statistics," Technometrics 10, 637— 
66. 

R.F. Gunst, J.T. Webster, and R.L. Mason (1976). “A Comparison of Least Squares 
and Latent Root Regression Estimators,” Technometrics 18, 75-83. 

G.W. Stewart (1977c). "Sensitivity Coefficients for the Effects of Errors in the Inde- 
pendent Variables in a Linear Regression," Technical Report TR-571, Department of 
Computer Science, University of Maryland, College Park, MD. 

A. Van der Sluis and G.W. Veltkamp (1979). “Restoring Rank and Consistency by 
Orthogonal Projection," Lin. Alg. and Its Applic. 28, 257-78. 
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12.4 Computing Subspaces with the SVD 


It is sometimes necessary to investigate the relationship between two given 
subspaces. How close are they? Do they intersect? Can one be “rotated” 
into the other? And so on. In this section we show how questions like 
these can be answered using the singular value decomposition. Knowledge 
of Chapter 5 and §8.6 are assumed. 


12.4.1 Rotation of Subspaces 


Suppose A € IR?*? is a data matrix obtained by performing a certain set 
of experiments. If the same set of experiments is performed again, then a 
different data matrix, B € IR"*?, is obtained. In the orthogonal Procrustes 
problem the possibility that B can be rotated into A is explored by solving 
the following problem: 


minimize || A — BQ ||- subject to QT Q — I,. (12.4.1) 
Recall that the trace of a matrix is the sum of its diagonal entries and thus, 
tr(CTC) 2 || C I2. It follows that if Q € IRP*? is orthogonal, then 
|| A — BQ | = tr(A7 A) + tr(BT B) — 2 tr(QT B? A). 


Thus, (12.4.1) is equivalent to the problem of maximizing tr(Q BT A). 

The maximizing Q can be found by calculating the SVD of BT A. In- 
deed, if UT(BTA)V = X = diag(oi,...,cp) is the SVD of this matrix 
and we define the orthogonal matrix Z by Z = VTQTU, then 


p p 
tr(Q? BT A) = tr(QTUXVT) = t(ZY) = 5 Zui € Sai. 
i=l i=1 


Clearly, the upper bound is attained by setting Q = UVT for then Z = I. 
This gives the following algorithm: 


Algorithm 12.4.1 Given A and B in R™”?, the following algorithm finds 
an orthogonal Q € IRP*? such that || A — BQ ||. is minimum. 


C=BTA 
Compute the SVD UTCV = X. Save U and V. 
Q-UVT, 


The solution matrix Q is the orthogonal polar factor of BT A. See 84.2.10. 


1 2 12 21 
| 3 4 | Q | 2.9 43 | 
5 6 ~ 1 5.2 61 
7 8 6.8 8.1 


Example 12.4.1 


.9999 —.0126 


Q= | .0126 ^ .9999 minimizes 


F 
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12.4.2 Intersection of Null Spaces 


Let A € IR"*" and B € IR?*" be given, and consider the problem of finding 
an orthonormal basis for null(A4A) n null(B). One approach is to compute 
the null space of the matrix 
A 
c= [5] 


since Cz = 0 «& rz € null(A) à null(B). However, a more economical 
procedure results if we exploit the following theorem. 


Theorem 12.4.1 Suppose A € R™*” and let {z1,..., z} be an orthonor- 
mal basis for null(A). Define Z = [21,...,2¢] and let (wy,...,w4) be an 
orthonormal basis for null(BZ) where B € IP*^. If W = [|w,..., wą], 
then the columns of ZW form an orthonormal basis for null(.A) N null( B). 


Proof. Since AZ = 0 and (BZ)W = 0, we clearly have ran(ZW) c 
null(A)ninull(B). Now suppose z is in both null(.A) and null(B). It follows 
that x = Za for some 0 £ a € IR. But since 0 = Bx = BZa, we must have 
a = Wb for some b c IR?. Thus, x = ZWb € ran( ZW). O 


When the SVD is used to compute the orthonormal bases in this theorem 
we obtain the following procedure: 


Algorithm 12.4.2 Given A € IR"*" and B € IRP*^, the following al- 
gorithm computes and integer s and a matrix Y = [yi;,...,y,] having 
orthonormal columns which span null(A) N null(B). If the intersection is 
trivial then s = 0. 


Compute the SVD UT AV, = diag(c;). Save V4 and set 
r = rank(A). 
ifr<n 
C = BVa(:,r + ln) 
Compute the SVD UECVe = diag(4;). Save Vo and set 
q = rank(C). 
ifq«n-r 
$—n-r-q 
Y =Va(:,r + 1:n)Vo(5q + Ln — r) 
else 
s=0 
end 
else 


end 
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The amount of work required by this algorithm depends upon the relative 
sizes of m, n, p, and r. 

We mention that a practical implementation of this algorithm requires 
a means for deciding when a computed singular value 6; is negligible. The 
use of a tolerance ô for this purpose (e.g. 6; < 6 = 6, = 0) implies that 
the columns of the computed Y “almost” define a common null space of A 
and B in the sense that || AY |lo ~ || BY || = ô. 


Example 12.4.2 If 


1 -1 1 4 2 0 
A= 1 -1 1 and B — 2 1 0 
1 -1 1 6 3 0 


then null( A) N null(B) = span(z), where z = [1 —2 —3]T. Applying Algorithm 12.4.2 
we find 


—8165 .000 || 455 .2673 1 
VaAVoc = | —.4082 .7071 | 0449 | m | —.5345 | e .2673| -2 |. 
.4082  .7071 —.8018 -3 
12.4.8 Angles Between Subspaces 
Let F and G be subspaces in IR" whose dimensions satisfy 
p = dim(F) > dim(G) = q > 1. 


The principal angles 6),..., € [0,7/2] between F and G are defined 
recursively by 


cos(04) max max v/v = uluk 
ucF veG 


subject to: 


Il 


ul] =o |] =2 
uu; =0 i=1:k-1 


vTvy, =0 i=1:k-1. 


Note that the principal angles satisfy 0 € 01 € --- < 0, € 7/2. The vectors 
(uj,..., ug) and (vi,..., v4) are called the principal vectors between the 
subspaces F and G. 

Principal angles and vectors arise in many important statistical appli- 
cations. The largest principal angle is related to the notion of distance 
between equidimensional subspaces that we discussed in 82.6.3 If p = q 
then dist(F, G) = 4/1— cos(0,)? = sin(@,). 

If the columns of Q p € IR"*? and Qc € IR"*? define orthonormal bases 
for F and G respectively, then 

max max wy . max max yT (QLQa)z 
ucF veG yeR? z€R? 
lul2=1 — [lella=2 llyll2=1  lizlla—1i 
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From the minimax characterization of singular values given in Theorem 
8.6.1 it follows that if YT(QLQG)Z = diag(cai...,04) is the SVD of 
QT Qc, then we may define the uz, vx, and 6, by 


[u1,..., Up] = QFY 
[v.a] = QcZ 
cos(k) = ok k=l1:q 


Typically, the spaces F and G are defined as the ranges of given matrices 
A € R™*? and B e IR?*?, In this case the desired orthonormal bases can 
be obtained by computing the QR factorizations of these two matrices. 


Algorithm 12.4.3 Given A € IR"*? and B € R™*! (p > q) each with lin- 
early independent columns, the following algorithm computes the orthogo- 
nal matrices U = [u,...,ug] and V = [vi,..., v4 ] and cos(61),... cos(0,) 
such that the 6, are the principal angles between ran(A) and ran(B) and 
ux and vy are the associated principal vectors. 


Use Algorithm 5.2.1 to compute the QR factorizations 


A — QARA QiQA-I, RA4€IP* 
B = QpHg QEQs =h, RgeR™” 
C=Q1QsB 


Compute the SVD Y7CZ = diag(cos(4,)). 
QaY(:, 1:q) = [u1,..., Ug | 
QsZ = [v1,--+,Uq] 


This algorithm requires about 4m(q? + 2p?) + 2pq(m + q) + 124? flops. 

The idea of using the SVD to compute the principal angles and vectors 
is due to Bjórck and Golub (1973). The problem of rank deficiency in A 
and B is also treated in this paper. 


12.4.4 Intersection of Subspaces 


Algorithm 12.4.3 can also be used to compute an orthonormal basis for 
ran(A) n ran(B) where A € IR"*? and B € IR?*? 


Theorem 12.4.2 Let {cos(9x), ux, vx) z.., be defined by Algorithm 12.4.3. 
If the index s is defined by 1 = cos(01) = --- = cos(0,) > cos(0,,1), then 


we have 


ran(A)Nran(B) = span(uj,...,u,) = span{v,...,us}. 
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Proof. The proof follows from the observation that if cos(@,) = 1, then 
Uk = UE. O 


With inexact arithmetic, it is necessary to compute the approximate mul- 
tiplicity of the unit cosines in Algorithm 12.4.3. 


Example 12.4.3 If 


1 2 1 5 
A= 3 4 and B= 3 7 
5 6 5 -1 


then the cosines of the principal angles between ran(A) and ran(B) are 1.000 and .856. 


Problems 


P12.4.1 Show that if A and B are m-by-p matrices, with p < m, then 


p 


min |A- BQ = 3 (oA - 257 A) + (BY). 


To= 
Q Q-I» t=1 


P12.4.2 Extend Algorithm 12.4.2 so that it can compute an orthonormal basis for 
null(Ai) M-++M null( Aa). 


P12.4.3 Extend Algorithm 12.4.3 to handle the case when A and B are rank deficient. 


P12.4.4 Relate the principal angles and vectors between ran(A) and ran(B) to the 
eigenvalues and eigenvectors of the generalized eigenvalue problem 


Lots ^v JL] nnb ote JE]. 


P12.4.5 Suppose A, B € R™*" and that A has full column rank. Show how to compute 
a symmetric matrix X € R”*” that minimizes || AX — B ||. Hint: Compute the SVD 
of A. 


Notes and References for Sec. 12.4 


The problem of minimizing || A — BQ ||; over all orthogonal matrices arises in psycho- 
metrics. See 


B. Green (1952). "The Orthogonal Approximation of an Oblique Structure in Factor 
Analysis,” Psychometrika 17, 429-40. 

P. Schonemann (1966). “A Generalized Solution of the Orthogonal Procrustes Problem,” 
Psychometrika 31, 1-10. 

LY. Bar-Itzhack (1975). “Iterative Optimal Orthogonalization of the Strapdown Ma- 
trix," IEEE Trans. Aerospace and Electronic Systems 11, 30-37. 

R.J. Hanson and M.J. Norris (1981). “Analysis of Measurements Based on the Singular 
Value Decomposition,” SIAM J. Sci. and Stat. Comp. 2, 363-374. 

H. Park (1991). *A Parallel Algorithm for the Unbalanced Orthogonal Procrustes Prob- 
lem," Parallel Computing 17, 913—923. 
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When B = I, this problem amounts to finding the closest orthogonal matrix to A. This 
is equivalent to the polar decomposition problem of §4.2.10. See 


A. Bjérck and C. Bowie (1971). “An Iterative Algorithm for Computing the Best Esti- 
mate of an Orthogonal Matrix,” SIAM J. Num. Anal. 8, 358-64. 


N.J. Higham (1986). “Computing the Polar Decomposition—with Applications,” SIAM 
J. Sci. and Stat. Comp. 7, 1160-1174. 


If A is reasonably close to being orthogonal itself, then Bjérck and Bowie’s technique is 
more efficient than the SVD algorithm. 


The problem of minimizing || AX — B || subject to the constraint that X is sym- 
metric is studied in 


N.J. Higham (1988). “The Symmetric Procrustes Problem," BIT 28, 133-43. 
Using the SVD to solve the canonical correlation problem is discussed in 


A. Bjérck and G.H. Golub (1973). “Numerical Methods for Computing Angles Between 
Linear Subspaces,” Math. Comp. 27, 579-94. 


G.H. Golub and H. Zha (1994). “Perturbation Analysis of the Canonical Correlations of 
Matrix Pairs,” Lin. Alg. and Its Applic. 210, 3-28. 


The SVD has other roles to play in statistical computation. 


S.J. Hammarling (1985). “The Singular Value Decomposition in Multivariate Statistics,” 
ACM SIGNUM Newsletter 20, 2-25. 


12.5 Updating Matrix Factorizations 


In many applications it is necessary to re-factor a given matrix A € IR™*” 
after it has been altered in some minimal sense. For example, given that 
we have the QR factorization of A, we may need to calculate the QR fac- 
torization of a matrix A that is obtained by (a) adding a general rank-one 
matrix to A, (b) appending a row (or column) to A, or (c) deleting a row 
(or column) from A. In this section we show that in situations like these, it 
is much more efficient to “update” A’s QR factorization than to generate it 
from scratch. We also show how to update the null space of a matrix after 
it has been augmented with an additional row. 


Before beginning, we mention that there are also techniques for updat- 
ing the factorizations PA = LU, A = GGT, and A = LDLT. Updating 
these factorizations, however, can be quite delicate because of pivoting re- 
quirements and because when we tamper with a positive definite matrix the 
result may not be positive definite. See Gill, Golub, Murray, and Saunders 
(1974) and Stewart (1979). Along these lines we briefly discuss hyperbolic 
transformations and their use in the Cholesky downdating problem. 

Familiarity with $3.5, $4.1, $5.1, $5.2, $5.4, and $5.5 is required. Com- 
plementary reading includes Gill, Murray, and Wright (1991). 
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12.5.4 Rank-One Changes 


Suppose we have the QR factorization QR = B € IR™*” and that we need 
to compute the QR factorization B + uv? = Q,R, where u,v € R” are 
given. Observe that 

B +u? = Q(R+u0v") (12.5.1) 
where w = QT u. Suppose that we compute rotations J4 .1,..., Ja, J) such 
that 

JT...JT iw = || w ||gei - 

Here, each Jj is a rotation in planes k and k+1. (For details, see Algorithm 


5.1.3.) If these same Givens rotations are applied to R, it can be shown 
that 


H - JT ...JI jR (12.5.2) 


is upper Hessenberg. For example, in the n — 4 case we start with 


X X X X x 
0 x x x x 
R- 0 0 x x vel x 
0 0 0 x x 
and then update as follows: 
X X X X x 
_ fp 0 x x x o mT x 
R= JR = 0 0 x x w = Jw = x 
0 0 x x 0 
X X X X x 
0 x x x x 
— JTR — — TIm — 
R= J,R = Ox x x w= hw = 0 
0 0 x x 0 
X X X X x 
_ Tp X X X X _ oa. 0 
H=JR= 0 x x x w= Jw = 0 
0 0 x x 0 
Consequently, 
(JP ---JT_,)(R+ wT) = H || w lev? = Hi (12.5.3) 


is also upper Hessenberg. 

In Algorithm 5.2.3, we show how to compute the QR factorization of an 
upper Hessenberg matrix in O(n?) flops. In particular, we can find Givens 
rotations Gy , k = 1:n — 1 such that 


GT_,.-GTH, = R, (12.5.4) 
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is upper triangular. Combining (12.5.1) through (12.5.4) we obtain the QR 
factorization B + uv? = QR, where 


Qi = QJa 1:7: 164): Gs. 


A careful assessment of the work reveals that about 26n? flops are required. 
The vector w — QT u requires 2n? flops. Computing H and accumulating 
the J, into Q. involves 12n? flops. Finally, computing Rı and multiplying 
the G, into Q involves 12n? flops. 

The technique readily extends to the case when B is rectangular. It can 
also be generalized to compute the QR factorization of B -- UVT where 
rank(UVT) — p » 1. 


12.5.2  Appending or Deleting a Column 


Assume that we have the QR factorization 
QR = A = [a...,a4] a; € R” (12.5.5) 


and partition the upper triangular matrix R € IR™*” as follows: 


Ry v R3 k-1 
R= 0 Tkk wt 1 
n 0 0 R33 m-—k 


k—1 1 n—-k 
Now suppose that we want to compute the QR factorization of 
A = [a..., Gk G1, 05] € R” (2-1) | 


Note that A is just A with its kth column deleted and that 


. Ry Ha 
QTÁ = 0 wf = H 

is upper Hessenberg, e.g., 

X X X X X 

0 x x x x 

0 0 x x x 

H = 0 0 x x x m=7,n=6,k=3 

0 0 0 x x 

0.00 0 x 

0.000 0 
Clearly, the unwanted subdiagonal elements hk+1,k;-- <, Rn,n—-1 can be ze- 


roed by a sequence of Givens rotations: GT ,-..GTH = R,. Here, G; is 
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a rotation in planes 7 and i+1 for i= k:n—1. Thus, if Q1 = QG,---Gr_1 
then A = Q, R; is the QR factorization of A. 

The above update procedure can be executed in O(n?) flops and is 
very useful in certain least squares problems. For example, one may wish 
to examine the significance of the kth factor in the underlying model by 
deleting the kth column of the corresponding data matrix and solving the 
resulting LS problem. 

In a similar vein, it is useful to be able to compute efficiently the solution 
to the LS problem after a column has been appended to A. Suppose we have 
the QR factorization (12.5.5) and now wish to compute the QR factorization 
of 


A= [@1,-++, Gk, Z, Gk 415-50] 
where z € R” is given. Note that if w = QT z then 
QTA = [Q?ay,...,Q7 as, w, QT asi... QT an | = A 


is upper triangular except for the presence of a “spike” in its k+1-st column, 
eg., 


X X X X X X 
0 x x X X Xx 
7 0 0 x x Xx x 
A=|0 00x x x m=7,n=5,k=3 
0 0 0 x 0 x 
0.0 0 x 0 0 
0 00 x 0 0 
It is possible to determine Givens rotations Jj,..1,..., Jk41 so that 
ui 
T T Wk+1 
Jeti Imi = 0 
0 


with JL --- JZ _ Å = Ř upper triangular. We illustrate this by continuing 
with the above example: 


Hm 

II 

on 

D 

II 
eooooox 
oOococooxx 
cOOOOXXxx 
OXXXXXX 
OOO X X X X 
OO XxX XX X X 
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X X X X X X 
0 x x x x xX 
0 0 x x x x 
H-JH-|0 00 x x x 
0 0 0 x O x 
0 0 0 0 0 x 
0 0 0 0 0 0 
X X X X X X 
0 x x x x x 
0 0 x x x x 
H-JlH-|000 x x x 
0.00 0 x x 
0 0 0 0 0 x 
0 0 0 0 0 0 


This update requires O(mn) flops. 


12.5.3  Appending or Deleting a Row 


Suppose we have the QR factorization QR = A € IR"*" and now wish to 
obtain the QR factorization of 


~ wr 
a= [5 | 
where w € R”. Note that 
. T 
diag(1,QT)ÀA = |e | =H 


is upper Hessenberg. Thus, Givens rotations J4,..., Jn could be determined 
so JT ... JTH = R; is upper triangular. It follows that 


A-2QiR 


is the desired QR factorization, where Q1 = diag(1, Q)J1 <- Js. 

No essential complications result if the new row is added between rows 
k and k + 1 of A. We merely apply the above with A replaced by PA and 
Q replaced by PQ where 


_ O0 Im-k 
p= |i, 0 | 


Upon completion diag(1, PT)Q, is the desired orthogonal factor. 
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Lastly, we consider how to update the QR factorization QR = A € R™*” 
when the first row of A is deleted. In particular, we wish to compute the 
QR factorization of the submatrix A, in 


zT 1 
A= P m-1 


(The procedure is similar when an arbitrary row is deleted.) Let q7 be the 
first row of Q and compute Givens rotations G;,..., G4 1 such that 


T T 
Gi Gad = Q], 


where a = +1. Note that 
H = GT..G*, |R = mn 1 
is upper Hessenberg and that 
QGa i6 = É >| 
where Q; € IR(^-D*(7-U is orthogonal. Thus, 


A- La] = (QGm-1 G1) (G1 GR) = É Qı | | m | 


trom which we conclude that A; = Q4, is the desired QR factorization. 


12.5.4 Hyperbolic Transformation Methods 


Recall that the “R” in A = QR is the transposed Cholesky factor in AT A = 
GGT. Thus, there is a close connection between the QR modifications just 
discussed and analogous modifications of the Cholesky factorization. We 
illustrate this with the Cholesky downdating problem which corresponds to 
the removal of an A-row in QR. In the Cholesky downdating problem we 
have the Cholesky factorization 


T T ot TY zT 
GG = aa =|] P (12.5.6) 
where A € IR?** with m > n and z € R”. Our task is to find a lower 
triangular G; such that G1GT = AT A,. There are several approaches to 
this interesting and important problem. Simply because it is an opportunity 
to introduce some new ideas, we present a downdating procedure that relies 
on hyperbolic transformations. 
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We start with a definition. H € IR™*™ is pseudo-orthogonal with respect 
to the signature matriz S = diag(+1) € R™*™ if HTSH = S. Now from 
(12.5.6) we have AT A = ATA, + zz? = GGT and so 


T 
ATA, = ATA- z;T = GGT — z;T = (63 | ^ E | | $ | 


Define the signature matrix 


g= [7 4 (12.5.7) 


and suppose that we can find H € IR(**DX(^*U such that HTSH = S with 
the property that 


T T 
n| S | - K | (12.5.8) 
is upper triangular. It follows that 
T 
ATA, — (6 s] "si | Sr | - cols] S| = GGT 


is the sought after Cholesky factorization. 
We now show how to construct the hyperbolic transformation H in 
(12.5.8) using hyperbolic rotations. A 2-by-2 hyperbolic rotation has the 


form 
w= | |= [St]. 


Note that if H c IR2*? is a hyperbolic rotation then HTSH = S where S 
— diag(-1,1). Paralleling our Givens rotations developments, let us see how 
hyperbolic rotations can be used for zeroing. From 


oglej G] eee 


we obtain the equation cro = sx). Note that there is no solution to this 
equation if xı = z2 Æ 0, a clue that hyperbolic rotations are not as nu- 
merically solid as their Givens rotation counterparts. If x; # x2 then it is 
possible to compute the cosh-sinh pair: 
if X9 = 0 
s=0;c=1 
else (12.5.9) 
if [22] « [zii 
T = 22/21; c 2 l/ V1 - 12; s cr 
elseif |zi| < |z2| 
T —z1/23; s = 1 V1— T2; c= st 
end 
end 
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Observe that the norm of the hyperbolic rotation produced by this algo- 
rithm gets large as xı gets close to ro. 

Now any matrix H = H(p,n 4- 1,0) c IR'* DX *U that is the identity 
everywhere except hpp = Aniin+1 = cosh(0) and hpn+1 = hai, = 
—sinh(0) satisfies H? SH = S where S is prescribed in (12.5.7). Using 
(12.5.9), we attempt to generate hyperbolic rotations Hy = H(1,k,4,) for 


k = 2:n +1 so that 
GT GT 
8m - [5 |. 


This turns out to be possible if A has full column rank. Hyperbolic rotation 
H, zeros entry (k + 1, k). In other words, if A has full column rank, then 
it can be shown that each call to (12.5.9) results in a cosh-sinh pair. See 
Alexander, Pan, and Plemmons (1988). 


12.5.5 Updating the ULV Decomposition 


Suppose A € R™*” is rank deficient and that we have a basis for its null 
space. If we add a row to A, 


then how easy is it to compute a null basis for A? When a sequence of 
such update problems are involved the issue is one of tracking the null 
space. Subspace tracking arises in à number of real-time signal processing 
applications. 

Working with the SVD is awkward in this context because O(n?) flops 
are required to recompute the SVD of a matrix that has undergone a unit 
rank perturbation. On the other hand, Stewart (1993) has shown that the 
null space updating problem becomes O(n?) per step if we properly couple 
the ideas of condition estimation of $3.5.4 and complete orthogonal decom- 
position. Recall from 55.4.2 that a complete orthogonal decomposition is 
two-sided and reveals the rank of the underlying matrix, 


UTAV = | Tu 0 
0 


0 | » Tau €R™', r=rank(A). 


A pair of QR factorizations (one with column pivoting) can be used to 
compute this. In this case Ti) = L is lower triangular in exact arithmetic. 
But with noise and roundoff we instead compute 
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L 0 
UTAV=|H E (12.5.10) 
0 0 


where L € R™? and E € IR(^-*(^7? are lower triangular and H and E 
are "small" compared to c;,;4(L). In this case we refer to (12.5.10) as a 
rank-revealing ULV decomposition.! Note that if 


V2[M WV] U=([U U2 | 


T n—r r m-—r 
then the columns of V2 define an approximate null space: 


| AV |l = IU2E lle S | E lz 


Our goal is to produce a rank-revealing U LV decomposition for the row- 
appended matrix A. To be more specific, our aim is to show how to produce 
updates of L, E, H, V, and (possibly) the rank in O(n?) flops. 

Note that 


L 0 
Uol[A], |H E 
los [e ]v-| 3 o 

wt yT 


By permuting the bottom row up “underneath” H and E we see that the 
challenge is to compute a rank-revealing ULV decomposition of 


(12.5.11) 


Clr r rs wwe oO 
Slr rare eco 
Slr rare ooo 


£ 
£ 
£ 
£ 
h 
h 
h 
w 


in O(n?) flops. Here and in the sequel, we set r = 4 and n = 7 to illustrate 
the main ideas. Bear in mind that the A and e entries are small and that 
1Dual to this is the URV decomposition in which the rank-revealing form is upper 


triangular. There are updating situations that sometimes favor the manipulation of this 
form instead of ULV. 
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we have deduced that the numerical rank is four. In practice, this involves 
comparisons with a small tolerance as discussed in §5.5.7. 

Using zeroing techniques similar to those presented in §12.5.3, the bot- 
tom row can be zeroed with a sequence of row rotations giving 


OX X XIX xoo 
O[xXxXx|xooo 
oOx x xjoooo 
ojx cojccoo 


OX X XIX XXO 


DjX X X|X XXX 


Because this zeroing process intermingles the (presumably large) entries of 
the bottom row with the entries from each of the other rows, the triangular 
form typically is not rank revealing. However, we can restore the rank- 
revealing structure with a combination of condition estimation and zero- 
chasing with rotations. Let us assume that with the added row, the new 
null space has dimension two. 

With a reliable condition estimator we produce a unit 2-norm vector p 
such that 


l| PTL |z = omis (L). 
See §3.5.4. Rotations (Ui,1)9., can be found such that 


Ug UseU SUSU BU iDP = es = Ia: 8). 


The matrix 
H = ULULULULULULL 


is lower Hessenberg and can be restored to a lower triangular form L+ by 
a sequence of column rotations: 


L4 = HVizVasVaa Vas Vs6 V67. 


It follows that 


el Ly = (ed H) Vi2Vz23V34Vas Vs6 Ve? = (p"Z) Vi2V23 V34 Vas Vse Vor 


has approximate norm Omin(L). Thus, we obtain a lower triangular matrix 


of the form 
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with small h's and e. We can repeat the condition estimation and zero 
chasing on the leading 6-by-6 portion thereby producing (perhaps) another 
row of small numbers: 


(If not, then the revealed rank is 6.) Continuing in this way, we can restore 
any lower triangular matrix to rank-revealing form. 

In the event that the y vector in (12.5.11) is small, we can reach rank- 
revealing form by a different, more efficient route. We start with a sequence 
of left and right Givens rotations to zero all but the first component of y: 


ele o ojo ooo 
ejo o cocco 
ojlo xn cjoooo 
= 
3 


| 
Sl rr rls & o 
E P rats & & © 
Rlr Sr joo 
R|z£m-£6m5|€9ooco 
| 
Rr rr se & & & 
Wiz Sr re & & o 
Rl Sr ST Se ese CO 
<8 Pt £m. o0oococo 
ele a ocloooo 
ole oocloooo 


V56 Use 


or c|locoooco 


Br rate 9o o 
Rl rr cmo 
Rl rr re @& CO O 
RIS rare ooo 
cic o aef/oocoe 
ojo cocooe 
Rr D ocv & & & 
Rr r aS & & © 
RISD as & OC O 
Slr rats CO OC CO 
cic o cocco 
ojo >o ojoo o o 
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Here, “U;;” means a rotation of rows i and j and “V;;” means a rotation of 
columns i and j. It is important to observe that there is no intermingling 
of small and large numbers during this process. The A's and e's are still 
small. 


Following this, we produce a sequence of rotations that transform the 
matrix to 


£0 0 0 0 0 
£ £00 0 0 
£ £ 2 0 0 0 
£ £ £ £ 0 0 
hh h h 0 0 (12.5.12) 
h h h h e 0 
h h h h e e 
Vy y y 0 0 
where all the y’s are small: 
£0 0 0j0 0 0 £0 0 0/0 0 0 
£ £0 0);0 0 0 £ £20 0)0 0 0 
£ £ £ 0[0 0 0 £ £ £ Olp 0 0 
U4g £ £ £ £p 0 0 U3g £ £ £ L| 0 0 
— h h h hje 0 O0 — h h h hje 0 0 
h h h hje ee h h h hje e0 
h h h hle e e h h h hje ee 
rz rx Oly 00 rz 0 Oly 00 
£0 0 0;0 0 0 £0 0 0} p 00 
£ £ 0 Ojp 0 0 £ £ 0 O}p 00 
£ £ £ Olp 00 £ £ £2 0} wp 00 
Uz8 £ £ £ £p 0 0 Uia £ £ £ t| p 00 
— h h h hje e0 — h h hhje OO 
h h h hje e0 h h h hie e O0 
h h h h|e e e h hh h|e e e 
rz 000[|y 00 0 0 0 0 0 0 


Note that y.» is small because of 2-norm preservation. Column rotations 
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in planes (1,5), (2,5), (3,5), and (4,5) can remove the j's: 


£0 0 0 £00 0);0 0 0 
£ £00 £ £ 0 0/0 0 0 
£ £ £0 £ £ £ Olp 00 
Vis £ £ £ £ Vos £ £ £ dip 00 
— h h h R — |h hh hje 00 
h h h R h h h hje e0 
h h h h h h h h|e e e 
y 0 0 0 y y 00|[y 00 
£0 0 0 £0 0 0/0 0 0 
£ £00 £ £ 0 010 0 0 
£ £ £0 £ £ £ 0/0 00 
Vas £ £ t t£ Vas £ £ £ £10 0 0 
— h h h Rh — h h h hje 0 0 
h h h h h h h hje e0 
h h h h h h h hje e e 
y yyo y y y yjy 0 0 
thus producing the structure displayed in (12.5.12). All the y’s are small 
and thus a sequence of row rotations Us7,U47,...,U17, can be constructed 
to clean out the bottom row giving the rank-revealed form 
£ 0 0 0/0 0 0 
£ £ 0 0/0 0 0 
£ £ £ 0/0 0 0 
£ £ £ £|00 0 
h h h hje 0 0 
h h h hje e0 
h h h hje e e 
0 0 0 0/0 0 0 


Problems 


P12.5.1 Suppose we have the QR factorization for A € R™*" and now wish to mini- 
mize || (A + uvT)z — b ||; where u,b € R™ and v € R” are given. Give an algorithm for 
solving this problem that requires O(mn) flops. Assume that Q must be updated. 
P12.5.2 Suppose we have the QR factorization QR = A € R'**". Give an algorithm 
for computing the QR factorization of the matrix A obtained by deleting the kth row of 
A. Your algorithm should require O(mn) flops. 

P12.5.3 Suppose T € R?*" is tridiagonal and symmetric and that v € R”. Show how 
the Lanczos algorithm can be used (in principle) to compute an orthogonal Q € R”*” 
in O(n?) flops such that QT(T + vvT)Q = T is also tridiagonal. 

P12.5.4 Suppose 


A cT n (m—1)xn 
= B cec R”, BER 
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has full column rank and m > n. Using the Sherman-Morrison-Woodbury formula show 
that 


d £4 1 , MAA 
Omin(B) T Gmin(A) | 1- cP (ATA)~1c 


P12.5.5 As a function of zı and 22, what is the 2-norm of the hyperbolic rotation 
produced by (12.5.9)? 


P12.5.6 Show that the hyperbolic reduction in §12.5.4 does not breakdown if A has 


full column rank. 
P12.5.7 Assume 
A= R H 
~ 0 E 
I E Hl; 


= ——— «1. 
P Omin(R) 


e-[8 or | 


[o 2][2: 82] - (2 2 |: 


then || J |l; X all H |l; 


where R and E are square with 


Show that if 


is orthogonal and 
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12.6 Modified/Structured Eigenproblems 


In this section we treat an array of constrained, inverse, and structured 
eigenvalue problems. Although the examples are not related, collectively 
they show how certain special eigenproblems can be solved using the basic 
factorization ideas presented in earlier chapters. 

The dependence of this section upon earlier portions of the book is as 
follows: 


885.1, 5.2, 8.1, 8.3 — 8126.1 
888.1, 8.3, 9.1 — §12.6.2 
884.7, 8.1 — §12.6.3 
885.1, 5.2, 5.4, 7.4, 81, 8.2, 8.3,86 — §12.6.4 


12.6.1 A Constrained Eigenvalue Problem 


Let A € IR?** be symmetric. The gradient of r(x) = zT Az/z™ z is zero if 
and only if x is an eigenvector of A. Thus the stationary values of r(x) are 
therefore the eigenvalues of A. 

In certain applications it is necessary to find the stationary values of r(x) 
subject to the constraint CT x = 0 where C € R™? with n > p. Suppose 


QTCZ = E o] MN r = rank(C) 
r p-r 


is a complete orthogonal decomposition of C. Define B € IR**^ by 


B B T 
T _ _ 11 12 
Q AQ=B= E 2s] n—r 
T n—r 
and set 
_ T _ u T 
y-Qzc- I] n—r 


Since CT x = 0 transforms to ST = 0, the original problem becomes one of 
finding the stationary values of r(y) = y" By/yT y subject to the constraint 
that u = 0. But this amounts merely to finding the stationary values 
(eigenvalues) of the (n — r)-by-(n — r) symmetric matrix Boo. 
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12.6.2 Two Inverse Eigenvalue Problems 


Consider the r = 1 case in the previous subsection. Let Àj € ... < À4. i be 
the stationary values of zT Az/zT x subject to the constraint c? x = 0. From 
Theorem 8.1.7, it is easy to show that these stationary values interlace the 
eigenvalues À; of A: 


An € Ani S Anaa € S 2 SA SA. 


Now suppose that A has distinct eigenvalues and that we are given the 
values \j,-.-,An—1 that satisfy 


An < Anni < Ane <0 <2 <A < A. 


We seek to determine a unit vector c € IR" such that the À; are the station- 
ary values of x7 Az subject to z^ x = 1 and cTz = 0. 

In order to determine the properties that c must have, we use the method 
of Lagrange multipliers. Equating the gradient of 


ó(z,A,u) = 2? Ax — Ma? x — 1) + 2uzTc 


to zero we obtain the important equation (A —AJ)x = —yc. Thus, A— AJ is 
nonsingular and so x = —4(A — AI)~!c. Applying c? to both sides of this 
equation and substituting the eigenvalue decomposition QT AQ = diag(A;) 
we obtain 


where d = QTc, i.e., 


pA) = Sod? [[0 - 5 = o. 


i=1 — j=l 


jr 
Notice that 1 = || c || = || d |a = d? +- - -+d is the coefficient of (—2)^-!. 
Since p(A) is a polynomial having zeroes \1,..., An—1 we must have 
n-1 . 
pA) = [JO;-»). 
j=l 


It follows from these two formulas for p(A) that 


n-1 . 

I[Ó; - Ax) 
d = 2—__ k= 1. (12.6.1) 
TEO- Ax) 
j=1 


IF 
jzk 
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This determines each d, up to its sign. Thus there are 2” different solutions 
c = Qd to the original problem. 

A related inverse eigenvalue problem involves finding a tridiagonal ma- 
trix 


[o fh "m 0 
Bo : 
T = 
: "S E Bn-1 
0 ce Bn-1 An 
such that T has prescribed eigenvalues {à1,..., An} and T(2:n, 2:n) has 
prescribed eigenvalues {A},..-, A4 1) with 


Ai >A, > Ag >! > Ane > An-1 > An. 


We show how to compute the tridiagonal T via the Lanczos process. Note 
that the A; are the stationary values of 


y? Ay 

yTy 

subject to dT y = 0 where A = diag(A1,..., An) and d is specified by (12.6.1). 
If we apply the Lanczos iteration (9.1.3) with A = A and qi = d, then it 
produces an orthogonal matrix Q and a tridiagonal matrix T such that 
QTAQ = T. With the definition z = QTy, it follows that the À; are the 
stationary values of 


(y) = 


TT 
V) = Ers 


subject to ez = 0. But these are precisely the eigenvalues of T(2:n, 2:n)! 
12.6.3 A Toeplitz Eigenproblem 
1 rT 
7-[r | 


is symmetric, positive definite, and Toeplitz with r € IR"~!. Our goal is to 
compute the smallest eigenvalue A444 (T) of T given that 


Assume that 


Amin (T) € Amin (G). 


This problem is considered in Cybenko and Van Loan (1986) and has ap- 
plications in signal processing. 


Suppose 
1 vT][o] _ [e 
r G yi yl? 
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a+rľy = da 
ar+ Gy = dy. 


If A ¢ A(G), then y = —a(G — AI)^!r, a £0, and 
a+r? [-o(G — AI) tr] = Ao. 
Thus, A is a zero of the rational function 
f(A) 1-A-rT(G — AI)r. 


We have dealt with similar functions in 88.5 and 812.1. In this case, f 
always has a negative slope 


FPA) = -1 -|I (G= AD? |3 < t 
If À < Amin(G), then it also has a negative second derivative: 
f"(A) = -2rT(G — Al)~3r < 0. 
Using these facts it can be shown that if 
Amin (T) € A9 < Amin(G), (12.6.2) 


then the Newton iteration 


AUD 
AD n AGO E (12.6.3) 


converges to Amin(T) monotonically from the right. Note that 


14 rTw — AO) 


AGED — AG) 
* lc wTw 


where w solves the "shifted" Yule-Walker system 
(G — A Tw = —r. 


Since, A“) < \min(G), this system is positive definite and Algorithm 4.7.1 
is applicable if we simply apply it to the normalized Toeplitz matrix (G — 
A02 Dy/(1 — A(Q), 

A starting value that satisfies (12.6.2) can be obtained by examining 
the Durbin algorithm when it is applied to T, = (T — AJ)/(1 — A). For 
this matrix the “r” vector is r/(1 — A) and so the Durbin algorithm (4.7.1) 
transforms to 
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r —r/(1—2) 
yQO — —T] 
for k=1:n-1 
fk — 14 [r(9]Ty COO (12.6.4) 


ak = —(rk41 + r7 p,y()/9, 
y(t) = 20 

Ak 
end 


From the discussion in §4.7.2 we know that 61,..., 6k > 0 implies that 
T)(1:k + 1, 1:k + 1) is positive definite. Hence, a suitably modified (12.6.4) 
can be used to compute m(A), the largest index m such that (1,..., Bm are 
all positive but that Bm4i € 0. Note that if m(A) = n — 2, then (12.6.2) 
holds. This suggests the following bisection scheme: 


Choose L and R so L € Xmin(T) < Amin(G) € R. 
Until m = n -2 
à = (L + R)/2 
m = m(X) 
if m«n-2 (12.6.5) 
R= 
end 
ifm=n-1 
L=x 
end 
end 


The bracketing interval [L, R] always contains a À such that m(A) = n — 2 
and so the current A has this property upon termination. 

There are several possible choices for a starting interval. One idea is to 
set L = 0 and R = 1 — [ri| since 


1 
0« Amin(T) < Amin(G) < Amin (| rı 7 }) =1- Ira | 
where the upper bound follows from Theorem 8.1.7. 
Note that the iterations in (12.6.4) and (12.6.5) involve at most O(n?) 
flops. A heuristic argument that O(log n) iterations are required is given 
in Cybenko and Van Loan (1986). 


12.6.4 An Orthogonal Matrix Eigenproblem 


Computing the eigenvalues and eigenvectors of a real orthogonal matrix 
A € R"*" is a problem that arises in signal processing, see Cybenko (1985). 
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The eigenvalues of A are on the unit circle and moreover, 


cos(0) +i sin(@) € A(A) «€  cos(0) € A (A) =X (254. 


This suggests computing Re(A(A)) via the Schur decomposition 


QT (23) Q = diag(cos(01),..., cos(0,)) 


and then computing Im(A(A)) with the formula s = V/1-— c?. Unfortu- 
nately, if |c| = 1, then this formula does not produce an accurate sine 
because of floating point cancellation. We could work with the skew- 
symmetric matrix (A — AT)/2 to get the “small sine" eigenvalues, but then 
we are talking about a method that requires a pair of full Schur decompo- 
sition problems and the approach begins to lose its appeal. 

A way around these difficulties that involves an interesting SVD ap- 
plication is proposed by Ammar, Gragg, and Reichel (1986). We present 
just the eigenvalue portion of their algorithm. The derivation is instructive 
because it involves practically every decomposition that we have studied. 

The first step is to orthogonally reduce A to upper Hessenberg form, 
QT AQ = H. (Frequently, A is already in Hessenberg form.) Without loss 
of generality, we may assume that H is unreduced with positive subdiagonal 
elements. 

If n is odd, then it must have a real eigenvalue because the eigenvalues 
of a real matrix come in complex conjugate pairs. In this case it is possible 
to deflate the problem with O(n) work to size n — 1 by carefully working 
with the eigenvector equation Hx = x (or Hx = —x). See Gragg (1986). 
Thus, we may assume that n is even. 

For 1 € k < n — 1, define the reflection G, € IR"“" by 


ha 00 0 
0 —eCk 8 0 
Ge = Gy(óx) = 0 » ce 0 


0 0 O0 Inka 


where cy = cos(x), Sk = sin($y), and 0 < k < m. It is possible to 
determine G,..., G4. such that 


H = (G1--- G4.) diag(1,...,1, —e4) 


where c, = +1. This is just the QR decomposition of H. The sines 
81,...,$,4-1 are the subdiagonal entries of H. The “R” matrix is diagonal 
because it is orthogonal and triangular. Since the determinant of a reflection 
is -1, det(H) = cn. This quantity is the product of H's eigenvalues and so 
if c, = —1, then (—1,1) C A(H). In this situation it is also possible to 
deflate. 
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So altogether we may assume that n is even and 


H = Gi(é1) - Ga-1(bn—-1)Gn (Gn) 


where Gn = Gn(¢n) = diag(1,...,1,—c4,) and c, = 1. Designate the 
sought after eigenvalues by 


MH) = (cos(8,) + i-sin(&) Fa (12.6.4) 


where m = n/2. 

The cosines c,,...,€n are called the Schur parameters and as we men- 
tioned, the corresponding sines are the subdiagonal entries of H. Using 
these numbers it is possible to construct. explicitly a pair of bidiagonal ma- 
trices Bc, Bs € IR"*" with the property that 


o(Bo(1:m, 1:m)) (cos(01/2), ...,c0s(8,,/2)) (12.6.5) 
c(Bs(l:m,1l:m)) = {sin(6)/2),...,sin(@m/2)} (12.6.6) 


The singular values of Bg(1:m,1:m) and Bs(1:m,1:m) can be computed 
using the bidiagonal SVD algorithm. The angle 4, can be accurately com- 
puted from sin(0,/2) if 0 < 0, < 2/2 and accurately computed from 
cos(0,/2) if 7/2 < 6, < m. The construction of Bc and Bs is based 
on three facts: 


1. H is similar to . 
H-H,H, 


where H, and H, are the odd and even reflection products 


Ho = GGz: Gs. 
He GoG4---Gn. 


These matrices are block diagonal with 2-by-2 and 1-by-1 blocks, i.e., 


H, = diag(R(¢1), R($3),.--»Rn-1)) (12.6.7) 
H. = diag(1, R(¢2), R(d4),.-.,R(dn—2),—-1) (12.6.8) 
where . 
(6) = | e ante |. (12.6.9) 
2. The eigenvalues of the symmetric tridiagonal matrices 
C= Hot He and  $- mH (12.6.10) 


are given by 


A(C) = {+c08(6)/2),...,+.cos(6m/2)} (12.6.11) 
{£sin(01/2),.-., +sin(8m/2)}. — (12.6.12) 


> 

~ 

un 

— 
| 
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3. It is possible to construct bidiagonalizations 
UECVe =Bco and UZ2SVs = Bs 


that satisfy (12.6.5) and (12.6.6). The transformations Uc, Vo, Us, 
and Vs are products of known reflections Gj and simple permutations. 


We begin the verification of these three facts by showing that H is 


similar to H,H,. The n = 8 case is sufficient for this purpose. Define the 
orthogonal matrix P by 


F = G3G4G5G6G7Gs8 
P= FF; F3 where F; = GsG6G7Gsg 
Fr = G7Gs. 


Since reflections are symmetric and G;G; = G;G; if |i — j| > 2, we see that 
FHF? = (GaG4G5GeGz7Gs)(0162G3G4GsGeG7Gs)(G3G4GsGeGzGg 


(G3G1GsGeG7Gs) GG 
= G,G3G3G4G5GeG7Gs, 


Fs(F3HF2)F2 = (GsGeG7Gsg)(G1G3G2G4GsGgG7Gg)(GsGeG7Gp)’, 
(GsGgG7Gg)G1G3G2Gq 
G1G3G5G5G4GgG7Gg 


PHPT 


F, (Fa F3H F? FP) FP 
(G7Gg)(G1G3GsG2G4GeG7Gs)(G7Gs)? 
= (G1G3G5G7)(G2G4GaGa) = H,H,. 


The second of the three facts that we need to establish relates the eigen- 
values of H = HoH. to the eigenvalues of the C and S matrices defined 
in (12.6.10). It follows from (12.6.7) and (12.6.8) that these matrices are 
symmetric, tridiagonal, and unreduced, e.g., 


—C] $1 0 0 
C = 1 $1 C1— C2 $2 0 
2 0 $2 C2 — C3 $3 
0 0 $3 c3 — C4 
—C] $1 0 0 
g = 1 $81 Cy +c — S2 0 
B 2 0 — $2 —C2 — C3 $3 
0 0 $3 C3 t c4 


By working with the definitions it is easy to verify that 


H+HT  H,H,-(H,H,)'! | H,H, + HeHo 2 


12.6. MonpiFIED/STRUCTURED EIGENPROBLEMS 629 


and 


Ë + HT  HoHe~(HoHe)"! _ HoHe-HeHo _ 
2 21 2i 
This shows that Re(A(H)) = A(2C? — I) and Im(A(H#)) = A(-2iCS) 
thereby establishing (12.6.11) and (12.6.12). 

Instead of thinking of these half-angle cosines and sines as eigenvalues 
of n-by-n matrices, it is more efficient to think of them as singular values 
of m-by-m matrices. This brings us to the bidiagonalization of C and S. 
The orthogonal equivalence transformations that carry out this task are 
based upon the Schur decompositions of H, and H,. A 2-by-2 reflection 
R($) defined by (12.6.9) has eigenvalues 1 and —1 and the following Schur 
decomposition: 


no nonem = [y 5]. 
Thus, if 


Qo = diag(R(¢1/2), R(63/2),..., R(¢n-1/2)) 
Qe diag(1, R(2/2), R(¢4/2),.-., R(¢n,-2/2), —1) 


then from (12.6.7) and (12.6.8) H, and H, have the following Schur decom- 
positions: 


QoHoQo = D, 
QeH.Qe De 


The matrices 


diag(1, —1,1, —1,---,1, —1) 
diag(1, 1, —1,1, —1,---,1, —1,- 1). 


co = QoC Qe = 50 (H, t He) Qe = 5 (Dol QoQe) + (QoQe)De) 
1 
QSQe = 3» (H, = He) Qe = 5 (Do(QoQe) = (QoQe)De) 


un 
2. 
- 
z 
II 


have the same singular values as C and S respectively. To analyze their 
structure we first note that QoQ, is banded: 


QoQe = 


oooocjoc XxX x 
ooo oc XK XK XK XK 
Oooo XK XK XK XK 
OOxX XxX xXOoo 
COX XxX Xoo 
X XX Xoooo 
X XXxoooo 
X XxXooooooco 
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(The main ideas from this point on are amply communicated with n = 8 
examples.) If Do(i,i) and De(j, j) have the opposite sign, then co =0 
from which we conclude that CC) has the form 


a à 0 0 0 0 0 0 

0 0 & 0 0 0 0 0 

0 ag 0 b3 0 0 0 0 

1 0 0 G3 0 b, 0 0 0 
CO-QUOQ. | o 0 0 a, 0 bs 0 0 
0 0 0 0 a 0 b 0 

0 0 0 0 0 a 0 0 

0 0 0 0 0 0 a, bg 


Analogously, if D,(i,i) and D.(j, j) have the same sign, then s — 0 from 
which we conclude that S$) has the form 


0.0 fh 0 0 0 0 0 

€2 dz 0 0 0 0 0 0 

0 0 ds 0 fz 0 0 0 

1 0 €4 0 d4 0 0 0 0 
$0 = Q,$Q, = 0.00 0 ds 0 fs 0 
0 0 0 eg; 0 dg 0 0 

0.0 00 0 0 d fr 

0.0 00 0 0 e O 


Row /column permutations of these matrices result in bidiagonal forms: 


Bo = C0([13572468],[12463578]) 
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Bs = $0([24681357],[12463578]) 


0 0 
0 0 
0 0 
0 0 
fi 90 
d3 fs 
0 ds 
0 0 


It is not hard to verify that a’s, b’s, d's, e's, and f's are all nonzero and 
this implies that the singular values of Bc (1:m, 1:m) and Bs(1:m, 1:m) are 
distinct. Since 


o(C) =0(Bc) = (cos(01/2), cos(01/2),...,cos(0,,/2), cos(Am/2) ) 
o(S) =o0(Bs) = 1sin(01/2), sin(0)/2),...,sin(64,/2), sin(64,/2) } 


we have verified (12.6.5) and (12.6.6). 


Problems 


P12.6.1 Let A € R™*” and consider the problem of finding the stationary values of 
yT Ax 


R(z,y) = ————— 
(^9) = Ty lalele 


y E R”, zE R” 


subject to the constraints 

CTz=0 CER”? n>p 

DTy=0 DER” m>q 
Show how to solve this problem by first computing complete orthogonal decompositions 
of C and D and then computing the SVD of a certain submatrix of a transformed A. 


P12.6.2 Suppose A € R”*” and B € RP*™. Assume that rank(A) = n and rank(B) = 
p. Using the methods of this section, show how to solve 


lò- Asl? henf 4] , 


2 
Br=0 Il T l2 +1 Bz=0 z 2 
—1 2 


Show that this is a constrained TLS problem. Is there always a solution? 


P12.6.3 Suppose A € R™*" is symmetric and that B € R?*" has rank p. Let d € RP. 
Show how to solve the problem of minimizing xT Az subject to the constraints || æ || = 
l and Bz = d. Indicate when a solution fails to exist. 


P12.6.4 Assume that A € R"*” is symmetric, large, and sparse and that C € R"™? is 


also large and sparse. How can the Lanczos process be used to find the stationary values 
of 


zT Ax 
ala 


r(z) = 
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subject to the constraint CTz = 0? Assume that a sparse QR factorization C = QR is 
available. 


P12.6.5 Relate the eigenvalues and eigenvectors of 
0 A 0 0 
0 0 A 0 


0 0 0 As 
A 0 0 0 


to the eigenvalues and eigenvectors of À = A142A3A4. Assume that the diagonal blocks 
in A are square. 


P12.6.6 Prove that if (12.6.2) holds, then (12.6.3) converges to Amis (T) montonically 
from the right. 


P12.6.7 Recall from $4.7 that it is possible to compute the inverse of a symmetric pos- 
itive definite Toeplitz matrix in O(n”) flops. Use this fact to obtain an initial bracketing 
interval for (12.6.5) that is based on || T^! ||, and || G^! ||... 


P12.6.8 A matrix A € R?*" is centrosymmetric if it is symmetric and persymmet- 
ric, i.e., A = E,AEn where E, = I,(: n: — 1:1). Show that if n = 2m and Q is the 
orthogonal matrix 
Q- [Im Im 
~ V2 Em  —Em , 

then A ALE 

T 11 + A12Em 0 

AQ = 
Q AQ | 0 An — Aiz2Em | 


where A11 = A(1:m, 1:m) and A45 = A(1:m, m + 1:n). Show that if n = 2m, then the 
Schur decomposition of a centrosymmetric matrix can be computed with one-fourth the 
flops that it takes to compute the Schur decomposition of a symmetric matrix, assuming 
that the QR algorithm is used in both cases. Repeat the problem if n = 2m + 1. 


P12.6.9 Suppose F,G € R"*" are symmetric and that 
Q=[Q Q2] 
p n-p 
is an n-by-n orthogonal matrix. Show how to compute Q and p so that 
f(Q, p) = w(QT FQ1) + tr(Q7 GQ3) 


is maximized. Hint: tr(QT FQ1) + t£(QT GQ2) = tr(QT (F — G)Q1) + tr(G). 
P12.6.10 Suppose A € R?*" is given and consider the problem of minimizing || A — S || p 
over all symmetric positive semidefinite matrices S that have rank r or less. Show that 


min(k,r) 
s= M. xad 
i=1 
solves this problem where 
A+AT 
2 
is the Schur decomposition of A's symmetric part, Q =[q1,...,@n], and 


At 2-5) DAR > O> Akp 2 2 An. 


= Qdiag(A1, e) An)QT 


P12.6.11 Verify for general n (even) that H is similar to Ho H. where these matrices 
are defined in 812.6.4. 


P12.6.12 Verify that the bidiagonal matrices Bc(1:m, 1:m) and Bs(1:m, 1:m) in §12.6.4 
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have nonzero entries on their diagonal and superdiagonal and specify their value. 
P12.6.13 A real 2n-by-2n matrix of the form 


A F | 


m=| g -AT 


is Hamiltonian if A € R”*” and F,G € R”*"” are symmetric. Equivalently, if the or- 
thogonal matrix J is defined by 
[09 h 
Dnus] 


then M € R2"x2n is Hamiltonian if and only if JTMJ = —MT. (a) Show that the 
eigenvalues of a Hamiltonian matrix come in plus-minus pairs. (b) A matrix S € R2"*?" 
is symplectic if JT SJ = —S-T. Show that if S is symplectic and M is Hamiltonian, then 
S-1MS is also Hamiltonian. (c) Show that if Q € R2"*2" is orthogonal and symplectic, 


then 
a=[ 3 3 | 


where QTQu + Q?FQ2 = I, and QTQi is symmetric. Thus, a Givens rotation of the 
form G(i,i +n, 6) is orthogonal symplectic as is the direct sum of n-by-n Householders. 
(d) Show how to compute a symplectic orthogonal U such that 

H R | 


T = 
utmu = | 5 _yt 


where H is upper Hessenberg and D is diagonal. 


Notes and References for Sec. 12.6 


The inverse eigenvalue problems discussed in this §12.6.1 and §12.6.2 appear in the fol- 
lowing survey articles: 


G.H. Golub (1973). “Some Modified Matrix Eigenvalue Problems,” SIAM Review 15, 
318-44. 

D. Boley and G.H. Golub (1987). *A Survey of Matrix Inverse Eigenvalue Problems," 
Inverse Problems 3, 595-622. 


References for the stationary value problem include 


G.E. Forsythe and G.H. Golub (1965). “On the Stationary Values of a Second-Degree 
Polynomial on the Unit Sphere,” SIAM J. App. Math. 13, 1050-68. 

G.H. Golub and R. Underwood (1970). “Stationary Values of the Ratio of Quadratic 
Forms Subject to Linear Constraints,” Z. Angew. Math. Phys. 21, 318-26. 

S. Leon (1994). “Maximizing Bilinear Forms Subject to Linear Constraints,” Lin. Alg. 
and Its Applic. 210, 49-58. 


An algorithm for minimizing zT Az where z satisfies Br = d and || z ||;—1 is presented in 


W. Gander, G.H. Golub, and U. von Matt (1991). “A Constrained Eigenvalue Problem,” 
in Numerical Linear Algebra, Digital Signal Processing, and Parallel Algorithms, 
G.H. Golub and P. Van Dooren (eds), Springer-Verlag, Berlin. 


Selected papers that discuss a range of inverse eigenvalue problems include 


G.H. Golub and J.H. Welsch (1969). “Calculation of Gauss Quadrature Rules,” Math. 
Comp. 23, 221-30. 

S. Friedland (1975). “On Inverse Multiplicative Eigenvalue Problems for Matrices,” Lin. 
Alg. and Its Applic. 12, 127-38. 
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D.L. Boley and G.H. Golub (1978). “The Matrix Inverse Eigenvalue Problem for Peri- 
odic Jacobi Matrices,” in Proc. Fourth Symposium on Basic Problems of Numerical 
Mathematics, Prague, pp. 63-76. 

W.E. Ferguson (1980). “The Construction of Jacobi and Periodic Jacobi Matrices with 
Prescribed Spectra,” Math. Comp. 35, 1203-1220. 

J. Kautsky and G.H. Golub (1983). “On the Calculation of Jacobi Matrices,” Lin. Alg. 
and Its Applic. 52/53, 439-456. 

D. Boley and G.H. Golub (1984). “A Modified Method for Restructuring Periodic Jacobi 
Matrices,” Math. Comp. 42, 143-150. 

W.B. Gragg and W.J. Harrod (1984). “The Numerically Stable Reconstruction of Jacobi 

: Matrices from Spectral Data," Numer. Math. 44, 317-336. 

S. Friedland, J. Nocedal, and M.L. Overton (1987). “The Formulation and Analysis of 
Numerical Methods for Inverse Eigenvalue Problems,” SIAM J. Numer. Anal. 24, 
634-667. 

M.T. Chu (1992). “Numerical Methods for Inverse Singular Value Problems,” SIAM J. 
Num. Anal. 29, 885-903. 

G. Ammar and G. He (1995). *On an Inverse Eigenvalue Problem for Unitary Matrices," 
Lin. Alg. and Its Applic. 218, 263-271. 

H. Zha and Z. Zhang (1995). *A Note on Constructing a Symmetric Matrix with Spec- 
ified Diagonal Entries and Eigenvalues,” BIT 35, 448-451. 


Various Toeplitz eigenvalue computations are presented in 


G. Cybenko and C. Van Loan (1986). “Computing the Minimum Eigenvalue of a Sym- 
metric Positive Definite Toeplitz Matrix," SIAM J. Sci. and Stat. Comp. 7, 123-131. 

W.F. Trench (1989). “Numerical Solution of the Eigenvalue Problem for Hermitian 
Toeplitz Matrices,” SIAM J. Matriz Anal. Appl. 10, 135-146. 

L. Reichel and L.N. Trefethen (1992). “Eigenvalues and Pseudo-eigenvalues of Toeplitz 
Matrices," Lin. Alg. and Its Applic. 162/163/164, 153-186. 

S.L. Handy and J.L. Barlow (1994). “Numerical Solution of the Eigenproblem for 
Banded, Symmetric Toeplitz Matrices,” SIAM J. Matrix Anal. Appl. 15, 205-214. 


Unitary /orthogonal eigenvalue problems are treated in 


H. Rutishauser (1966). “Bestimmung der Eigenwerte Orthogonaler Matrizen," Numer. 
Math. 9, 104-108. 

P.J. Eberlein and C.P. Huang (1975). “Global Convergence of the QR Algorithm for 
Unitary Matrices with Some Results for Normal Matrices," SIAM J. Numer. Anal. 
12, 421-453. 

G. Cybenko (1985). *Computing Pisarenko Frequency Estimates," in Proceedings of 
the Princeton Conference on Information Science and Systems, Dept. of Electrical 
Engineering, Princeton University. 

W. B. Gragg (1986). "The QR Algorithm for Unitary Hessenberg Matrices," J. Comp. 
Appl. Math. 16, 1-8. 

G.S. Ammar, W.B. Gragg, and L. Reichel (1985). *On the Eigenproblem for Orthogonal 
Matrices,” Proc. IEEE Conference on Decision and Control, 1963-1966. 

W.B. Gragg and L. Reichel (1990). *A Divide and Conquer Method for Unitary and 
Orthogonal Eigenproblems," Numer. Math. 57, 695-718. 


Hamiltonian eigenproblems (see P12.6.13) occur throughout optimal control theory and 
are very important. 


C.C. Paige and C. Van Loan (1981). *A Schur Decomposition for Hamiltonian Matrices," 
Lin. Alg. and Its Applic. 41, 11-32. 

C. Van Loan (1984). “A Symplectic Method for Approximating All the Eigenvalues of 
a Hamiltonian Matrix," Lin. Alg. and Its Applic. 61, 233—252. 
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R. Byers (1986) “A Hamiltonian QR Algorithm,” SIAM J. Sci. and Stat. Comp. 7, 
212-229. 

V. Mehrmann (1988). “A Symplectic Orthogonal Method for Single Input or Single 
Output Discrete Time Optimal Quadratic Control Problems,” SIAM J. Matriz Anal. 
Appl. 9, 221-247. 

G. Ammar and V. Mehrmann (1991). “On Hamiltonian and Symplectic Hessenberg 
Forms,” Lin.Alg. and Its Application 149, 55-72. 

A. Bunse-Gerstner, R. Byers, and V. Mehrmann (1992). “A Chart of Numerical Methods 
for Structured Eigenvalue Problems,” SIAM J. Matriz Anal. Appl. 13, 419-453. 


Other papers on modified/structured eigenvalue problems include 


A. Bunse-Gerstner and W.B. Gragg (1988). “Singular Value Decompositions of Complex 
Symmetric Matrices,” J. Comp. Applic. Math. 21, 41-54. 

R. Byers (1988). “A Bisection Method for Measuring the Distance of a Stable Matrix to 
the Unstable Matrices,” SIAM J. Sci. Stat. Comp. 9, 875-881. 

J.W. Demmel and W. Gragg (1993). “On Computing Accurate Singular Values and 
Eigenvalues of Matrices with Acyclic Graphs,” Lin. Alg. and Its Applic. 185, 203- 
217. 

A. Bunse-Gerstner, R. Byers, and V. Mehrmann (1993). “Numerical Methods for Simul- 
taneous Diagonalization,” SIAM J. Matriz Anal. Appl. 14, 927-949. 
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powers, 569 

range of, 49 

rank of, 49 

sign function, 372 

transpose, 3 
Matrix functions, 555ff 

integrating, 569-70 

Jordan decomposition and, 557-8 

polynomial evaluation, 568-9 
Matrix norms, 54ff 

consistency, 55 

Frobenius, 55 

relations between, 56 

subordinate, 56 
Matrix times matrix 

block, 25-7, 29-30 

dot version, 11 

outer product version, 13 

parallel, 292ff 

saxpy version, 12 

shared memory, 292-3 

torus, 293-9 
Matrix times vector, 5—6 

block version, 28 
Message-passing, 276-7 
Minimax theorem for 

symmetric eigenvalues, 394 

singular values, 449 
MINRES, 494 
Mixed precision, 127 
Modified eigenproblems, 621-3 
Modified Gram-Schmidt, 231-2, 241 
Modified LR algorithm, 361 
Moore-Penrose conditions, 257-8 
Multiple eigenvalues, 

and Lanczos tridiagonalization, 485 

and matrix functions, 560-1 
Multiple right hand sides, 91, 121 
Multiplicity of eigenvalues, 316 
Multipliers, 96 


Neighbor, 276 

Netlib, xiv 

Network topology, 276 

Node program, 285 

Nonderogatory matrices, 349 
Nonsingular, 50 

Normal equations, 237-9, 545-7 

Normal matrix, 313-4 

Normality and eigenvalue condition, 323 
Norms 


matrix, 54ff 
vector, 52ff 
Notation 


block matrices, 24-5 
colon, 7, 19, 27 
matrix, 3 
submatrix, 27 
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vector, 4 

x-o, 16 
Null, 49 
Null space, 49 

intersection of, 602-3 
Numerical rank and SVD 260—2 


Off, 426 
Operation count. See Work or 
particular algorithm, 
Orthogonal 
basis, 69 
complement, 69 
matrix, 208 
Procrustes problem, 601 
projection, 75 
Orthogonal iteration 
Ritz acceleration and, 422 
symmetric, 410-1 
unsymmetric, 332-4 
Orthogonal matrix representations 
WY block form, 213-5 
factored form, 212-3 
Givens rotations, 217-8 
Orthonormal basis computation, 229-32 
Orthonormality, 69 
Outer product, 8 
Overdetermined system, 236 
Overflow, 61 
Overwriting, 23 


Pade approximation, 572-4 
Parallel computation 
gaxpy 
message passing ring, 279 
shared memory (dynamic), 289-90 
shared memory (static), 287 
Cholesky 
message passing ring, 300 
divide and conquer, 445-6 
Jacobi, 431-4 
matrix multiplication 
shared memory, 292-3 
torus, 293-9 
Parlett-Reid method, 162-3 
Partitioned matrix, 6 
Pencils, 375 
diagonalization of, 461—2 
equivalence of, 376 
symmetric-definite, 461 
Permutation matrices, 109-10 
Persymmetric matrix, 193 
Perturbation theory for 
eigenvalues, 320-4 
eigenvalues (symmetric case), 395-7 
eigenvectors, 326-7 


eigenvectors (symmetric case), 399—400 


generalized eigenvalue, 377-8 

invariant subspaces 
symmetric case, 397—99 
unsymmetric case, 324-5 

least squares problem, 242-4 
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linear equation problem, 80ff 
pseudo-inverse, 258 
singular subspace pair, 450-1 
singular values, 449—50 
underdetermined systems, 272-3 
Pipelining, 35-6 
Pivoting, 109 
Aasen, 166 
column, 248-50 
complete, 117 
partial, 110 
symmetric matrices and, 148 
Pivots, 97 
condition and, 107 
zero, 103 
Plane rotations. See Givens rotations, 
p-norms, 52 
minimization in, 236 
Polar decomposition, 149 
Polynomial preconditioner, 539-40 
Positive definite systems, 140—1 
Gauss-Seidel and, 512 
LDL? and, 142 
properties of, 141 
unsymmetric, 142 
Power method, 330-2 
symmetric case 405-6 
Power series of matrix, 565 
Powers of a matrix, 569 
Preconditioned conjugate 
gradient method, 532ff 
Pre-conditioners 
incomplete block,536-7 
incomplete Cholesky, 535 
polynomial, 539—40 
unsymmetric case, 550 
Principal angles and vectors, 603—4 
Processor id, 276 
Procrustes problem, 601 
Projections, 75 
Pseudo-eigenvalues, 576-7 
Pseudo-inverse, 257 


QMR, 551 

QR algorithm for eigenvalues 
symmetric version, 414ff 
unsymmetric version, 352ff 

QR factorization, 223ff 
Block Householder 

computation, 225-6 

Classical gram-Schmidt and, 230-1 
column pivoting and, 248—50, 591 
Fast Givens computation of, 228-9 
Givens computation of, 226-7 
Hessenberg matrices and, 227-8 
Householder computation of, 224-5 
least square problem and, 239—42 
Modified Gram-Schmidt and, 231-2 
properties of, 229—30 
rank of matrix and, 248 
square systems and, 270-1 
tridiagonal matrix and, 417 
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underdetermined systems and, 271-2 
updating, 607-13 

Quadratic form, 394 

QZ algorithm, 384ff 


Range, 49 
Rank of matrix, 49 
determination of, 259 
QR factorization and, 248 
subset selection and, 591-4 
SVD and, 72-3 
Rank deficient LS problem, 256ff 
Rank-one modification 
of diagonal matrix, 442-4 
eigenvalues and, 397 
QR factorization and, 607-13 
Rayleigh quotient iteration, 408-9 
QR algorithm and, 422 
symmetric-definite pencils and, 465 
R-bidiagonalization, 552-3 
Re, 14 
Rea] Schur decomposition, 341 
generalized, 377 
recy, 277 
Rectangular LU, 102 
Relaxation parameter, 514 
Residuals vs, accuracy, 124 
Restarting 
Arnoldi method and, 501-3 
GMRES and, 549 
Lanczos and, 584 
Ridge regression, 583-5 
Ring, 276 
Ring algorithms 
Cholesky, 300-3 
Jacobi eigensolver, 434 
Ritz, 
acceleration, 334 
pairs and Arnoldi method, 500 
pairs and Lanczos method, 475 
Rotation of subspaces, 601 
Rounding errors, See 
particular algorithm. 
Roundoff error analysis, 62-7 
Row addition or deletion, 610—1 
Row partition, 6 
Row scaling, 125 
Row weighting in LS problem, 265 


Saxpy, 4,5 

Scaling 
linear systems and, 125 

Scaling and squaring for exp(A), 573-4 

Schmidt orthogonalization, See 
Gram-Schmidt, 

Schur complement, 103 

Schur decomposition, 313 
generalized, 377 
matrix functions and, 558-61 
normal matrices and, 313-4 
rea] matrices and, 341-2 
symmetric matrices and, 393 
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two-by-two symmetric, 427-8 
Schur vectors, 313 
Search directions, 521ff 
Secular equations, 443, 582 
Selective orthogonalizaton, 483-4 
Semidefinite systems, 147-9 


send, 277 

Sensitivity. See Perturbation 
theory for. 

Sep, 325 


Serious breakdown, 505 
Shared memory traffic, 287 
Shared memory systems, 285-9 
Sherman-Morrison formula, 50 
Shifts in 
QR algorithm, 353, 356 
QZ algorithm, 382-3 
SVD algorithm, 452 
symmetric QR algorithm, 418-20 
Sign function, 372 
Similarity transformation, 311 
condition of, 317 
nonunitary, 314, 317 
Simpson’s rule, 570 
Simultaneous diagonalization, 461—3 
Simultaneous iteration, See 
LR iteration, orthogonal iteration 
Treppeniteration, 
Sine of matrix, 566 
Single shift QR iteration, 354-5 
Singular matrix, 50 
Singular value decomposition (SVD), 70-3 
algorithm for, 253-4, 448, 452 
constrained least squares and, 582-3 
generalized, 465-7 
Lanczos method for, 495-6 
Linear systems and, 80 
numerical rank and, 260-2 
null space and, 71, 602-3 
projections and, 75 
proof of, 70 
pseudo-inverse and, 257 
rank of matrix and, 71 
ridge regression and, 583-5 
subset selection and, 591—4 
subspace intersection and, 604—5 
subspace rotation and, 601 
total least squares and, 596-8 
Singular values 
eigenvalues and, 318 
interlacing properties, 449-50 
minimax characterization, 449 
perturbation of, 450-1 
Singular vectors, 70—1 
Span, 49 
Spectral radius, 511 
Spectrum, 310 
Speed-up, 281 
Splitting, 511 
Square root of a matrix, 149 
S-step Lanczos, 487 
Static Scheduling, 286 
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Stationary values, 621 
Steepest descent and conjugate 
gradients, 520ff 
Store by 
band, 19-20 
block, 45 
diagonal, 21-3 
Stride, 38-40 
Strassen method, 31-3, 66 
Structure exploitation, 16-24 
Sturm sequences, 440 
Submatrix, 27 
Subordinate norm, 56 
Subset selection, 590 
Subspace, 49 
angles between, 603-4 
basis for, 49 
deflating, 381, 386 
dimension, 49 
distance between, 76-7 
intersection, 604—5 
invariant, 372, 307—403 
null space intersection, 602-3 
orthogonal projections onto, 
rotation of, 601 
Successive over-relaxation (SOR), 514 
Symmetric eigenproblem, 391ff 
Symmetric indefinite systems, 161ff 
Symmetric positive definite systems, 
Lanczos and, ff 
Symmetric storage, 20-2 
Symmetric successive over-relaxation, 
(SSOR), 516-7 
sym.schur, 427 
SYMMLQ, 494 
Sweep, 429 
Sylvester equation, 366-9 
Sylvester law of inertia, 403 


'Taylor approximation of e^, 565-7 
Threshold Jacobi, 436 
'Toeplitz matrix methods, 193ff 
'Torus, 276 
'Total least squares, 595ff 
Trace, 310 
Transformation matrices 
Fast Givens, 218-21 
Gauss, 94-5 
Givens, 215 
Householder, 209 
Hyperbolic, 611-2 
Trench algorithm, 199 
Treppeniteration, 335-6 
Triangular matrices, 93 
multiplication between, 17 
unit, 92 
Triangular systems, 88ff 
band, 153-4 
multiple, 91 
non-square, 92 
Tridiagonalization, 
Householder, 414 
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Krylov subspaces and, 416 
Lanczos, 472ff 
Tridiagonal matrices, 416 
inverse of, 537 
QR algorithm and, 417ff 
Tridiagonal systems, 156-7 


ULV updating, 613-8 
Underdetermined systems, 271-3 
Underflow, 61 

Unit roundoff, 61 

Unit stride, 38-40 

Unitary matrix, 73 

Unreduced Hessenberg matrices, 346 
Unsymmetric eigenproblem, 308ff 
Unsymmetric Lanczos method, 503-6 


Unsymmetric positive definite systems, 142 


Updating the QR factorization, 606~13 


Vandermonde systems, 183-8 
Variance-covariance matrix, 245-6 
Vector length issue, 37-8 
Vector notation, 4 
Vector norms, 52ff 
Vector operations, 4, 36 
Vector touch, 41-2 
Vector computing 
models, 37 
operations, 4, 36 
pipelining, 35-6 
Vectorization, 34ff, 157-8 


Weighting 
column, 264-5 
row, 586 
See also Scaling, 
Wielandt-Hoffman theorem for 
eigenvalues, 395 
singular values, 450 
Wilkinson shift, 418 
Work 
least squares methods, 263 
linear system methods, 270 
SVD and, 254 
Workspace, 23 
Wrap mapping, 278 
WY representation, 213-5 


Yule-Walker problem, 194 


